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Forward 

The  proposed  research  is  to  develop  Gaussian  random  fields  methods  to  study  fork-join  net¬ 
works  (FJNs)  with  synchronization  constraints.  FJNs  arise  from  many  military  operations,  e.g., 
Army  force  deployment  and  counter-terrorism,  where  commands  come  from  one  or  multiple  types 
of  operations  and  each  operation  requires  multiple  parallel  and/or  sequential  tasks  to  be  processed 
in  service  stations  with  multiple  servers,  and  to  be  rejoined  for  further  processing  with  synchro¬ 
nization  constraints,  e.g.,  non-exchangeability.  In  this  research,  we  focus  on  the  non-exchangeable 
synchronization  constraint,  which  requires  that  tasks  can  only  be  synchronized  only  if  all  tasks  of 
the  same  job  are  completed.  The  main  mathematical  challenge  lies  in  the  resequencing  of  arrival 
orders  after  service  completion  at  each  station,  which  requires  an  infinite  dimensional  state  space 
to  track  the  status  of  all  parallel  tasks  for  each  job.  That  was  an  extremely  difficult  open  problem. 

We  have  developed  a  novel  method  using  multiparameter  sequential  empirical  processes  driven 
by  service  vectors  of  parallel  tasks  of  each  job  to  describe  the  system  dynamics  of  FJNs.  This 
research  has  produced  two  research  papers,  focusing  on  a  single  class  FJN  in  two  asymptotic  regimes, 
where  the  arrival  rate  of  jobs  and  the  number  of  servers  in  each  station  get  large  appropriately.  We 
consider  the  number  of  tasks  in  each  waiting  buffer  for  synchronization,  jointly  with  the  number 
of  tasks  in  each  parallel  service  station  and  the  number  of  synchronized  jobs.  In  the  first  paper, 
we  consider  the  quality-driven  regime,  and  show  that  all  the  limiting  processes  are  functionals  of 
two  independent  processes  -  the  limiting  arrival  process  and  a  generalized  Kiefer  process  driven 
by  the  service  vector  of  each  job.  We  characterize  the  transient  and  stationary  distributions  of 
the  limiting  processes.  In  the  second  paper,  we  consider  the  quality-and-efficiency-driven  regime 
(Halfin- Whitt  regime),  and  show  that  all  the  limit  processes  in  the  functional  central  limit  theorem 
are  also  characterized  via  functionals  of  the  initial  limit  quantities,  the  arrival  limit  process  and 
a  generalized  multiparameter  Kiefer  process  driven  by  the  service  vectors.  This  new  framework 
is  being  further  generalized  to  analyze  fork-join  networks  with  multiple  classes  of  jobs,  and  study 
control,  reliability  and  provisioning  problems. 
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1  Statement  of  the  Research  Problem 


Fork-join  networks  consist  of  a  set  of  service  stations  that  serve  job  requests  simultaneously  and 
sequentially  according  to  pre-designated  deterministic  precedence  constraints.  Such  networks  have 
many  applications  in  manufacturing  and  telecommunications  [4,  16,  25,  26,  27,  43,  53,  36,  37,  49], 
patient  flow  analysis  in  healthcare  [22,  1,  2,  57,  58],  parallel  computing  [47,  52,  51,  32],  military 
deployment  operations  [24,  56],  and  law  enforcement  systems  [29].  Two  types  of  synchronization 
constraints  are  of  particular  interest.  One  is  called  exchangeable  synchronization  (ES)  in  which 
tasks  are  not  tagged  with  a  particular  job  and  can  be  synchronized  for  a  service  completion  once 
the  necessary  tasks  are  completed.  This  type  of  synchronization  constraint  is  often  used  in  manufac¬ 
turing  systems;  for  example,  in  many  assembly  systems,  different  parts  of  a  product  are  processed 
at  separate  workstations  or  plant  locations  and  a  product  will  be  assembled  once  all  of  its  neces¬ 
sary  parts  are  completed.  In  this  case,  the  parts  are  not  tagged  with  a  particular  product,  since 
they  are  standardized  for  the  same  type  of  product.  The  second  type  is  called  non- exchangeable 
synchronization  (NES).  Tasks  are  tagged  with  a  particular  job  and  can  only  be  synchronized  when 
all  the  parallel  tasks  of  the  same  job  are  completed. 


Figure  1:  A  fundamental  fork-join  network 

Fork-join  networks  with  NES  are  used  in  many  applications,  including  healthcare  systems, 
parallel  computing,  MapReducing  scheduling  (e.g.,  large-scale  parallel  Web  search),  disassembly 
and  reassembly  systems  in  manufacturing  and  so  on.  In  patient  flows  of  hospitals  [1,  2,  22,  57,  58], 
the  treatment  and  discharge  processes  are  typical  examples  of  fork-join  networks  with  NES:  a 
patient  must  have  all  test  results  ready  before  a  doctor  examination  and  these  tests  are  conducted 
in  different  units/laboratories  and  can  never  be  mixed;  a  patient,  after  the  discharge  decision  is 
made,  must  wait  for  necessary  procedures,  pharmacy,  transportation,  etc.,  before  being  physically 
discharged.  In  MapReduce  scheduling  [11,  32,  51,  54],  jobs  are  processed  in  two  phases:  in  the 
map  phase,  a  large-scale  data  input  (e.g.,  Web  processing  data)  is  distributed  into  individual 
computation  nodes,  and  each  node  processes  one  block  of  input  data,  and  after  the  execution  of  all 
blocks  of  the  same  data  input,  they  will  be  joined  as  an  output  in  the  reduce  phase. 

Despite  the  vast  appealing  applications  of  such  networks,  very  little  has  been  known  about  their 
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behaviors  in  the  many-server  heavy-traffic  regimes.  We  start  considering  a  fundamental  fork-join 
network  model  with  a  single  class  of  jobs  and  NES,  where  each  arriving  job  is  forked  into  several 
parallel  tasks  upon  arrival  and  each  of  the  tasks  is  processed  in  parallel  at  a  dedicated  service 
station  with  multiple  servers  under  the  non-idling  FCFS  discipline.  Upon  service  completion,  each 
task  will  join  a  buffer  associated  with  its  service  station,  and  wait  for  synchronization,  such  that 
each  job  is  synchronized  only  if  all  of  its  tasks  have  been  completed.  Figure  1  depicts  such  a 
model.  In  this  model,  in  addition  to  the  service  dynamics,  we  are  interested  in  the  waiting  buffer 
dynamics  for  synchronization.  One  important  performance  measure  is  the  response  time  of  a  job, 
namely,  the  time  from  arrival  to  synchronization.  The  response  time  may  also  include  the  time 
required  for  the  synchronization  process,  but  we  do  not  consider  that  in  this  work.  Thus,  the 
response  time  includes  two  delays,  waiting  time  for  service  and  waiting  time  for  synchronization. 
Since  each  service  station  can  be  regarded  as  a  separate  many-server  queue,  the  waiting  time  for 
service  has  been  well  understood.  However,  the  waiting  time  for  synchronization,  which  is  our  focus 
in  this  paper,  has  not  been  studied.  Specifically,  we  investigate  the  waiting  buffer  dynamics  for 
synchronization  jointly  with  the  service  dynamics. 

The  main  mathematical  challenge  lies  in  the  resequencing  of  the  arrival  orders  after  service 
completion  at  each  service  station,  due  to  the  randomness  of  the  service  times  and  the  multi-server 
setting.  When  there  is  a  single  server  in  each  of  the  parallel  service  station  and  the  service  discipline 
is  FCFS,  the  service  completion  order  is  preserved  to  be  the  same  as  the  arrival  order  of  tasks  in 
each  service  station,  so  that  the  two  types  of  synchronization  constraints  are  equivalent.  However, 
the  arrival  order  of  tasks  in  each  service  station  can  be  resequenced  at  the  service  completion 
epochs  when  the  number  of  servers  in  a  service  station  is  larger  than  one  or  the  service  discipline 
is  not  FCFS.  Resequencing  has  been  one  of  the  most  difficult  obstacles  in  the  study  of  fork-join 
networks.  Some  limited  work  has  been  dedicated  to  the  study  of  such  challenging  problems.  For 
example,  substantial  efforts  were  dedicated  to  the  study  of  the  max-plus  recursions  [21,  3,  12]. 
More  recently,  Atar  et  al.  [2]  have  studied  a  fork-join  network  with  single-server  service  stations 
where  tasks  may  reenter  for  service  at  some  service  stations  in  a  Bernoulli  mechanism  so  that 
the  arrival  orders  of  tasks  at  each  service  station  are  resequenced  after  service  completion.  They 
show  that  under  a  priority  discipline,  the  system  dynamics  with  NES  is  asymptotically  equivalent 
to  that  with  ES  in  the  conventional  (single-server)  heavy-traffic  regime.  For  a  Markovian  fork- 
join  network  with  multiple  servers,  Zviran  [58]  shows  that  the  system  dynamics  with  NES  is  also 
asymptotically  equivalent  to  that  with  ES  in  the  conventional  heavy-traffic  regime.  However,  the 
two  types  of  synchronization  constraints  lead  to  very  different  system  dynamics  when  the  service 
stations  have  many  parallel  servers  in  the  Halfin- Whitt  regime,  as  conjectured  in  [2,  58].  To  the 
best  of  our  knowledge,  our  work  is  the  first  to  tackle  the  resequencing  problem  in  non-Mar kovian 
fork-join  networks  with  NES  and  multiple-server  service  stations  in  the  many-server  heavy-traffic 
regimes.  We  will  consider  both  cases  when  each  service  station  is  operating  in  the  quality- driven 
(QD)  regime,  or  in  the  quality-and-efficiency-driven  (QED,  Halfin- Whitt)  regimes. 

When  all  the  service  stations  operate  in  the  QD  regime,  this  is  equivalent  to  a  model  which  has 
infinite  numbers  of  servers  at  all  service  stations  asymptotically.  To  describe  the  system  dynamics, 
we  can  start  with  a  graphical  representation  as  shown  in  Figure  2(a)  for  a  system  of  two  parallel 
tasks.  At  each  job’s  arrival  epoch,  we  mark  the  arrival  time  on  the  horizontal  line  (x-axis)  and 
the  service  times  of  all  parallel  tasks  on  the  vertical  line  (y-axis).  At  each  time  t,  by  drawing 
a  negative  forty-five  degree  line,  we  can  count  the  numbers  of  tasks  in  each  service  station  and 
each  waiting  buffer  for  synchronization.  When  the  arrival  process  is  Poisson,  we  can  apply  Poisson 


4 


0  Arrival  time  t 

(a)  The  QD  Regime 


Figure  2:  Graphical  representations  of  the  system  dynamics  in  the  QD  and  QED  regimes 


random  measure  theory,  similarly  as  in  the  “physics”  of  M/GI/oo  queues  [14].  It  can  be  shown 
that  at  each  time  t,  the  numbers  of  tasks  in  each  service  station  and  each  waiting  buffer  for 
synchronization  all  have  Poisson  distributions  and  their  parameter  values  and  covariances  can  also 
be  obtained;  see  Proposition  2.1.  However,  when  the  arrival  process  is  more  general,  this  Poisson 
random  measure  approach  does  not  work,  and  we  cannot  obtain  the  exact  distributions  for  these 
performance  measures.  Thus,  we  consider  heavy-traffic  approximations  of  the  system  dynamics 
when  the  arrival  rate  is  relatively  large.  For  that,  the  graphical  representation  in  Figure  2(a)  also 
plays  an  important  role;  see  the  system’s  dynamic  equations  in  §2. 

Here  we  develop  a  new  approach  to  describe  the  system  dynamics.  Both  the  service  dynamics 
and  the  waiting  buffer  dynamics  for  synchronization  are  represented  as  functionals  of  the  mul¬ 
tiparameter  sequential  empirical  process  driven  by  the  service  vector  of  all  parallel  tasks.  Their 
diffusion-scaled  processes  converge  weakly  to  limit  processes  that  can  be  all  represented  as  function¬ 
als  of  two  independent  processes  -  the  limiting  arrival  process  and  the  multiparameter  generalized 
Kiefer  process  driven  by  the  service  vector.  When  the  limiting  arrival  process  is  Brownian  motion, 
we  show  that  the  aforementioned  limiting  processes  are  a  multidimensional  continuous  Gaussian 
process,  and  thus  characterize  the  joint  transient  and  stationary  distributions  of  these  processes. 
We  also  study  the  impact  of  the  correlation  among  the  service  vector  upon  these  distributions. 

There  are  several  advantages  with  this  new  approach.  It  gives  a  clean  and  elegant  representation 
of  the  limiting  processes,  involving  only  two  independent  stochastic  processes  arising  from  the  arrival 
and  service  processes.  Moreover,  the  characterization  of  the  limiting  processes  as  Gaussian  and  their 
transient  and  stationary  distributional  properties  can  be  easily  obtained.  Furthermore,  this  new 
approach  paves  the  way  to  study  the  fork-join  network  with  all  the  service  stations  operating  in  the 
QED  regime.  We  believe  that  this  new  approach  launches  a  new  framework  to  study  more  general 
fork-join  networks,  for  example,  multiclass  models,  and  when  the  service  vectors  for  parallel  tasks 
form  a  stationary  and  weakly  dependent  sequence. 

When  all  the  service  stations  are  in  the  QED  regime,  we  exploit  the  delicate  relationship  between 
finite-server  models  and  its  corresponding  infinite-server  models.  This  was  exploited  to  prove 
an  FCLT  for  the  GI/GI/n  queue  by  Reed  [45].  We  make  an  important  observation  that  the 
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multidimensional  processes  of  the  waiting  buffer  dynamics  for  synchronization  and  the  service 
dynamics  in  the  fork-join  network  can  be  represented  through  the  corresponding  processes  in  the 
infinite-server  case.  Thus,  our  results  from  the  QD  regime  can  be  extended  to  establish  the  FCLT 
for  the  fork-join  network  in  the  QED  regime.  To  illustrate,  we  can  also  use  a  similar  graphical 
representation  as  in  Figure  2(a)  to  describe  the  system  dynamics.  In  particular,  as  shown  in  Figure 
2(b),  we  mark  the  entering  service  times  of  all  parallel  tasks  for  each  job  on  the  horizontal  line  (x- 
axis),  and  the  service  times  of  them  on  the  vertical  line  (y-axis).  However,  unlike  the  infinite-server 
case,  tasks  of  the  same  job  may  not  enter  service  simultaneously.  Fortunately,  it  is  well  known 
that  the  delay  for  service  in  the  QED  regime  is  0{\/y/n)\  see,  e.g.,  [45,  50].  This  asymptotically 
negligible  difference  among  entering  service  times  helps  us  to  establish  the  FCLT  for  the  fork-join 
network  in  the  QED  regime. 

An  important  implication  of  our  results  is  that  the  size  of  the  waiting  buffer  for  synchronization 
is  of  the  same  order  as  that  of  the  total  number  of  tasks  at  each  service  station,  and  thus,  the  waiting 
time  for  synchronization  is  of  the  same  order  as  the  service  time,  0(1).  Namely,  the  response  time 
in  the  QED  regime  includes  the  delay  for  service  0(l/y/n),  the  service  time  0(1)  and  the  delay 
for  synchronization  0(1).  It  remains  to  establish  the  FCLT  for  the  (virtual)  waiting  time  process 
for  synchronization.  More  importantly,  it  remains  to  find  an  optimal  scheduling  policy  that  will 
minimize  the  delay  for  synchronization  in  the  single-class  case.  We  believe  that  our  methods  and 
results  will  provide  useful  insights  towards  that  direction. 

In  the  development  of  approximations  to  the  fork-join  system,  we  make  a  fundamental  contri¬ 
bution  to  the  study  of  multiparameter  sequential  empirical  processes  driven  by  random  vectors. 
Sequential  empirical  processes  driven  by  a  sequence  of  random  vectors  (allowing  for  correlation 
among  random  variables  in  the  vector)  and  their  limits  as  generalized  Kiefer  processes  have  been 
studied  in  the  statistics  literature;  see  e.g.,  [42,  6,  8,  9,  13],  but  the  convergence  is  proved  in  the 
space  .D([0,T]fc,M)  of  real-valued  cadlag  functions  defined  on  [0,T]fc,  k  >  2,  endowed  with  the 
generalized  Skorohod  Ji  topology  in  [35]  and  [48].  In  our  setting,  it  is  necessary  to  prove  the  con¬ 
vergence  in  the  space  D([0,  T],  D([0,  T]fc,  M))  of  function-valued  cadlag  functions  defined  on  [0,  T], 
endowed  with  the  standard  Skorohod  J\  topology  for  D([0,  T]k,  M)-valued  cadlag  functions. 

Literature  review.  Most  of  the  literature  on  fork-join  networks  is  on  models  with  single-server 
service  stations.  We  only  give  a  brief  summary  here  on  relevant  work  in  heavy  traffic.  These 
studies  are  in  the  conventional  (single-server)  heavy-traffic  regime.  In  Varma’s  dissertation  [53], 
the  diffusion-scaled  workload  processes  and  unsynchronized  queueing  processes  in  some  fork-join 
network  models  with  ES  are  shown  to  converge  weakly  to  certain  multi-dimensional  reflected  Brow¬ 
nian  motions.  The  stationary  distributions  of  the  system  response  time  and  the  processes  counting 
the  number  of  tasks  in  unsynchronized  queues  are  specified  by  some  partial  differential  equations 
(PDEs).  Nguyen  [36]  shows  the  diffusion-scaled  processes  counting  the  queue  lengths  at  each  service 
station  of  a  single-class  fork-join  network  model  with  ES  converge  to  a  reflected  Brownian  motion 
in  a  polyhedral  cone  of  the  nonnegative  orthant.  Nguyen  [37]  discusses  the  difficult  challenges  with 
multiclass  fork-join  models  with  ES.  As  we  have  noted  above,  for  a  fork-join  network  with  feedback 
and  NES,  Atar  et  al.  [2]  show  that  a  dynamic  priority  discipline  achieves  throughput  optimal¬ 
ity  asymptotically  in  the  conventional  heavy-traffic  regime,  as  a  consequence  of  the  asymptotic 
equivalence  between  NES  and  ES  constraints. 

Very  little  work  has  been  done  for  fork-join  networks  with  multi-server  service  stations.  Ko 
and  Serfozo  [25]  consider  a  fork-join  network  model  with  a  single  class  of  Poisson  arrivals  and  K 
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parallel  service  stations  with  multiple  servers  at  each  station  and  exponential  service  times,  and 
obtain  an  approximation  for  the  distribution  of  the  system  response  time  in  equilibrium  under 
the  NES  constraint.  Dai  [10]  provides  an  exact  simulation  algorithm  to  approximate  the  system 
response  time  in  equilibrium  for  the  same  Markovian  model  in  [25]  by  using  a  “coupling  from  the 
past”  method.  Zviran  [58]  studies  optimal  control  of  multi-server  feedforward  fork-join  networks 
with  exponential  service  times  in  the  conventional  heavy-traffic  regime  and  shows  that  FCFS  is 
asymptotically  optimal  and  the  resequencing  disruption  becomes  asymptotically  negligible.  Zaied 
[57]  calculates  mean  offered-load  functions  of  fork-join  networks  with  NES  and  multiple  processing 
stages  when  the  arrival  process  is  time-inhomogeneous  Poisson  and  service  times  for  parallel  tasks 
are  independent,  and  studies  staffing  of  time-varying  emergency  departments  and  synchronization 
delays  under  Markovian  assumptions.  Both  dissertations  of  Zviran  [58]  and  Zaied  [57]  are  motivated 
from  applications  in  patient  flow  analysis.  Gurvich  and  Ward  [17]  study  optimal  matching  policies 
for  a  pure  join  model  (Markovian)  with  multiple  classes  of  jobs  under  certain  matching  constraints. 

This  work  contributes  to  the  recent  development  for  non-Markovian  many-server  queueing  mod¬ 
els.  We  only  mention  those  that  are  most  relevant  to  our  work  due  to  the  large  volume  of  papers 
on  many-server  models.  Krichagina  and  Puhalskii  [28]  first  observe  that  the  system  dynamics  of  an 
infinite-server  queueing  model  can  be  represented  by  an  integral  functional  of  a  sequential  empirical 
process  driven  by  service  times.  They  show  that  the  diffusion-scaled  processes  counting  the  num¬ 
ber  of  jobs  in  the  system  can  be  approximated  by  a  functional  of  a  standard  Kiefer  process  driven 
by  service  times.  Pang  and  Whitt  [39,  41]  generalize  that  approach  to  establish  two-paranreter 
process  limits  for  G/G/oo  queues  when  the  service  times  are  i.i.d.  and  weakly  dependent,  respec¬ 
tively.  Reed  [45]  and  Puhalskii  and  Reed  [44]  have  observed  a  relationship  between  finite-server 
and  infinite-server  queues  and  generalized  the  approach  in  [28]  to  obtain  the  diffusion  limits  for 
G/GI/N  queues  in  the  Halfin- Whitt  regime.  Mandelbaum  and  Momcilovic  [33]  generalize  the 
approach  by  Reed  [45]  to  study  G/GI/N  +  GI  queues  with  abandonment.  All  these  papers  use 
sequential  empirical  processes  driven  by  a  sequence  of  univariate  random  variables.  Our  approach 
to  study  fork-join  networks  with  NES  uses  multiparameter  sequential  empirical  processes  driven  by 
a  sequence  of  i.i.d.  random  vectors  and  properties  of  multiparameter  processes  and  martingales. 

Notation  Throughout  the  paper,  the  following  notation  will  be  used.  M  and  M+  (Rrf  and  Ml, 
respectively)  denote  sets  of  real  and  real  non-negative  numbers  (ci-dimensional  vectors,  respectively, 
d>  2).  Z+  is  the  set  of  non-negative  integers.  N  denotes  the  set  of  natural  numbers.  For  a,  b  G  M, 
we  denote  a  A  b  :=  min(a,  b)  and  a  V  b  :=  max(a,  b).  For  x  G  M,  let  x+  :=  max{x,  0}  and 
x~  :=  —  min{x,0}.  For  any  x  G  M+,  [xj  is  used  to  denote  the  largest  integer  less  than  or  equal  to 
x.  We  use  bold  letter  to  denote  a  vector,  e.g.,  x  :=  (aq,  ...,xjv)  £  M^.  0  denotes  the  vector  whose 
components  are  all  0.  For  x,y  G  Mw,  we  denote  x  <  y,  x  >  y  and  x  >  y  in  the  componentwise 
sense,  and  let  x  A  y  =  (aq  A  yi,  ...,xn  A  vn)-  We  use  1(A)  to  denote  the  indicator  function  of  a 
set  A.  The  abbreviation  a.s.  means  almost  surely.  For  any  univariate  distribution  function  F(-), 
we  denote  Fc(-)  =  1  —  F(-).  For  a  G  and  a  G  M+,  we  call  Aa(8)  ( resp .  Aa(8))  is  a  5-grid  of 
[0,ai]  x  [0,0:2]  (resp.  [0,  a]),  if  Aa(5)  (resp.  Aa(8))  is  a  finite  partition  of  [0,  ai]  x  [0,0:2]  (resp. 
[0,  a]),  where  each  element  of  the  partition  is  the  rectangle  [si,ti)  x  [s2,i2)  (resp.  [s,t)),  satisfying 
0  <  <  4  <  ccfc  for  k  =  1,2  (resp.  0  <  s  <  t),  and  min/c=1)2(4  —  Sfc)  >  8  (resp.  t  —  s  >  8).  For 

two  real-valued  functions  /  and  g ,  we  write  f(x)  =  0(g(x ))  if  limsup^^  \  f(x)/g(x)\  <  00. 

All  random  variables  and  processes  are  defined  on  a  common  probability  space  (D,  F,  P).  For 
any  two  complete  separable  metric  spaces  and  52,  we  denote  <Si  x  S2  as  their  product  space, 
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endowed  with  the  maximum  metric,  i.e.,  the  maximum  of  two  metrics  on  5i  and  S2 ■  Sk  is  used 
to  represent  fc-fold  product  space  of  any  complete  and  separable  metric  space  5  for  k  6  N.  For  a 
complete  separable  metric  space  5,  B([0,  00),  5)  denotes  the  space  of  all  5-valued  cadlag  functions 
on  [0,  00),  and  is  endowed  with  the  Skorohod  J\  topology  (see,  e.g.,  [5,  15,  55]).  Denote  D  = 
D([0,  00), M).  The  space  B([0,  00), D),  denoted  as  Djj,  is  endowed  with  the  Skorohod  J\  topology, 
that  is,  both  inside  and  outside  D  spaces  are  endowed  with  the  Skorohod  J\  topology.  For  a 
complete  separable  metric  space  5,  the  space  D([0,  oo)2, 5)  is  the  space  of  all  5-valued  “continuous 
from  above  with  limits  from  below”  functions  on  [0,oo)2,  and  is  endowed  with  the  same  metric  as 
defined  by  [18].  D2  =  0( [0,  l]2,  M)  is  denoted  as  the  space  of  all  “continuous  from  above  with  limits 
from  below”  functions  on  the  unit  square  [0,  l]2  in  the  sense  of  Neuhaus  [35],  and  is  endowed  with 
the  same  metric  do2  as  in  [35] .  Weak  convergence  of  probability  measures  /j,n  to  //  will  be  denoted 

df 

as  nn  =>•  n-  For  a  sequence  of  processes  {Xn  :  n  >  1}  and  a  process  X,  we  use  notation  Xn  A  X 
to  denote  the  convergence  in  finite-dimensional  distributions  of  Xn  to  X. 


2  The  Infinite-Server  Fork-Join  Network  Model 

2.1  Model  and  Assumptions 

In  this  section,  we  present  a  detailed  description  of  our  infinite-server  fork-join  network  model  and 
the  assumptions.  As  shown  in  Figure  1,  there  is  a  single  class  of  jobs,  and  each  job  is  forked  into  K 
parallel  tasks,  K  >  2.  Each  task  is  processed  in  a  service  station  with  multiple  servers  under  the 
FCFS  discipline.  There  is  an  infinite  number  of  servers  at  each  station.  After  service  completion, 
each  task  will  join  a  waiting  buffer  for  synchronization  associated  with  each  service  station,  and 
when  all  tasks  of  the  same  job  are  completed,  they  will  be  synchronized  and  leave  the  system.  Here 
we  assume  that  the  synchronization  process  takes  zero  amount  of  time. 

Let  A  :=  {A(t)  :  t  >  0}  be  the  arrival  process  of  jobs  with  r*  representing  the  arrival  time 
of  the  job,  i  £  N.  Let  \rf  :  i  >  1}  denote  the  i.i.d.  service  time  vectors  of  the  parallel 
tasks.  The  joint  distribution  of  the  service  time  vector  for  the  ith  job  if  is  F(x)  :=  F(x  1, ... ,xk ) 
for  Xk  >  0,  k  =  1  Their  marginal  distributions  are  Ff ;(x),  for  x  >  0,  k  =  1  The 

joint  distribution  of  any  two  service  times  //*  and  rfk  is  Fj^(xj,Xk)  :=  P{rfj  <  Xj,rfk  <  x\~ )  for 
Xj,Xk  >  0,  j,k  =  1  Note  Fjt ^(-, -)  =  Ffc(-)  when  j  =  k  for  j,k  =  1  We  denote 

F?k(xj,Xk)  ■=  P(Vj  >  xj  5  rfk  >  xk)  =  1  -Fj(xj)-Fk(xk)+Fjtk(xj,xk)  for  xj,xk  >  0,  j,  k  = 

Note  Fjk(-,  •)  =  Fk(-)  when  j  =  k  for  j,  k  =  1,  Let  rfm  :=  max{r/] ,  ...,rfK}  be  the  maximum 

of  the  components  in  the  service  vector  rj1,  and  Fm(x)  :=  P{rfm  <  x)  =  F(x,...,x)  for  x  >  0. 
(Throughout  the  paper,  we  use  subscript  “m”  to  index  quantities  and  processes  associated  with 
the  maximum.)  The  service  process  is  assumed  to  be  independent  of  the  arrivals.  We  exclude  the 
case  of  perfectly  positively  correlated  parallel  services  since  that  will  lead  to  empty  waiting  buffers 
for  synchronization. 

Let  Xk  :=  {Xk(t)  :  t  >  0}  be  the  process  counting  the  number  of  tasks  in  service  at  the  service 
station  k ,  and  Yk  =  {Yk(t)  :  t  >  0}  be  the  process  counting  the  number  of  tasks  in  the  waiting 
buffer  for  synchronization  (unsynchronized  queue)  after  service  completion  at  service  station  k, 
k  =  1, ...,  K .  Let  5  :=  {5(f)  :  t  >  0}  be  the  process  counting  the  number  of  synchronized  jobs  and 
Dk  :=  {.Dfc(t)  :  f  >  0}  be  the  process  counting  the  number  of  tasks  that  have  completed  service  at 


station  k,  k  =  1  Denote  X  :=  (X\ ,...,Xk),  Y  '■=  (Yi, ...,  Yr-)  and  D  :=  {D\, ...,  Dk)-  We 

assume  that  the  system  starts  empty. 

Assuming  that  the  arrival  process  Aft)  is  Poisson  with  rate  A,  by  Poisson  random  measure 
theory,  we  can  easily  obtain  the  following  properties  on  the  processes  Xft),  Yft)  and  Sft)  at  each 
time  t. 

Proposition  2.1.  If  the  arrival  process  A(t )  is  Poisson  with  rate  X,  then  at  each  time  t  >  0,  for  k  = 
1  X k(t)  has  a  Poisson  distribution  with  rate  X  Ff(s)ds,  Yk(t)  has  a  Poisson  distribution 

with  rate  X  f^(Ff1(s)  —  Fj?(s))ds,  and  S(t)  has  a  Poisson  distribution  with  rate  X  Fm(s)ds.  For 
each  time  t  >  0  and  j,  k  =  1, K, 

Cov(Xj(t),Xk(t ))  =  A  f*  Fj  k(s,  s)ds,  (2.1) 

CovfYjft),  Yk(t))  =  X  [t(Fjjk(s,s)-Fm(s))ds,  (2.2) 

Cov(Xj(t),Yk(t))  =  X  f  (Fk(s)  -  Fjtk(s,s))ds.  (2.3) 

Jo 

For  each  time  t  >  0  and  k  =  1, ...,  K ,  Sft )  is  independent  of  Xk(t)  and  Yk(t ).  When  K  =  2,  Y\  (t) 
and  Yi{t)  are  independent  for  each  t  >  0. 

When  the  arrival  process  Aft)  is  general,  we  will  obtain  heavy-traffic  limits  for  the  fluid  and 
diffusion  scaled  processes  of  (X,  Y,  S )  jointly.  We  will  let  the  arrival  rate  grow  large  for  the  system 
to  be  in  heavy  traffic.  For  that,  we  consider  a  sequence  of  such  systems  indexed  by  n  and  use 
superscript  n  for  the  processes  A,X,Y,D,S ,  and  the  arrival  times  {rt  :  i  >  1},  but  we  let  the 
service  times  { r f  :  i  >  1}  and  their  distribution  functions  be  independent  of  n.  We  make  the 
following  assumption  on  the  arrival  process  An. 

Assumption  1:  FCLT  for  arrivals.  There  exist:  (i)  a  continuous  nondecreasing  deterministic 
real-valued  function  a  on  [0,  oo)  with  o(0)  =  0  and  (ii)  a  stochastic  process  A  with  continuous  sample 
paths,  such  that 

An  :=  n~  2  ( An  —  na)  =$■  A  in  D  as  n  — >  oo.  (2-4) 

■ 

It  follows  from  (3.6)  that  we  have  the  associated  FWLLN 

An 

An  :=  —  a  in  D  as  n  — >  oo.  (2-5) 

n 

When  the  arrival  process  is  renewal,  the  limit  in  (2.5)  is  aft)  =  At,  for  t  >  0  and  some  positive 
constant  A,  and  the  limit  in  (3.6)  is  A  =  X c^Ba,  where  cf  is  the  squared  coefficient  of  variation 
(SCV)  of  an  interarrival  time,  and  Ba  is  a  standard  Brownian  motion  (BM). 

We  also  make  a  regularity  assumption  on  the  joint  service-time  distribution  function  Ffx). 

Assumption  2:  Service  time  distributions.  The  joint  distribution  function  Ffx)  of  the 
service  time  vectors  {rjl  :  i  £  N}  is  continuous.  ■ 

From  the  graphical  representation  of  the  system  dynamics  in  Figure  2(a),  we  can  write,  for  each 
t  >  0  and  k  =  1, ...,  K, 

An(t) 

*?(*)=  £  1(t"  +  4  >t),  (2.6) 

i— 1 
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(2.7) 


An(t) 

Yk(t)  =  ^  l(r”  +  <  t  and  rf  +  ifk,  >  t  for  some  k'  /  A:) 

2=1 
An(t) 

=  x;  (w +4  so-iw +*<<)) 

2=1 
An(t) 

=  Y1  W7?  +  ??m  >  t)  ~  +  rfk  >  *))  > 

2=1 

An(t)  An(t) 

s"(t)  =  x;  iw +*<t)=  E  iw + 4  <  *.  «)■  (2-s) 

2=1  2=1 

An(t) 

om=  E  !W  +  4<i)-  (2.9) 

2=1 

The  following  balanced  equations  hold  for  each  t  >  0  and  k  =  1, K , 

Dk(t)  =  An(t)  —  Xk(t),  (2.10) 

Y?(t)  =  D%(t)  -  (2.11) 

As  we  have  remarked  in  the  introduction,  by  previous  work  on  G/GI/oo  queues  [28],  each 
individual  process  Xf  and  Df  ( resp .  Sn )  can  be  represented  by  an  integral  of  a  sequential  empirical 
process  driven  by  a  sequence  of  i.i.d.  random  variables  {rfk  :  i  >  1}  (resp.  {rfn  :  i  >  1})  for  each 
k  =  1,  Thus,  Gaussian  limits  for  the  diffusion-scaled  processes  Xk,  DV  and  Sn  in  heavy  traffic 

for  each  k  can  be  established,  and  as  a  consequence,  a  Gaussian  limit  for  the  diffusion-scaled  process 
Yk  can  be  obtained  from  those  of  Dk  and  Sn ,  k  =  1, ...,  K.  However,  that  approach  does  not  give  a 
characterization  of  the  joint  Gaussian  distribution  of  the  limiting  processes  of  the  diffusion-scaled 
processes  (. Xn,Yn,Sn ). 

We  will  represent  all  the  processes  Xn,  Yn ,  Sn  as  integrals  of  a  multiparameter  sequential  empir¬ 
ical  process  Kn  :=  {Kn(t,x)  :  t  >  0,x  €  M7)'}  driven  by  the  sequence  of  service  vectors  {rf  :  i  >  1}: 

[nt\ 

Kn(t,x)  :=  -  V  l(rf  <x),  t>  0,  x  G  R*  (2.12) 

n  ' 

%=i 

That  is,  we  write,  for  t  >  0  and  k  =  1, 

Xf(t)  =  n  [  [  l(s  +  xk  >  t)dKn  (An(s),x)  ,  (2-13) 

Jo  Jr* f 

Tfc"(A)  =  n  f  (  (l(s  +  .Tfc  <  f)  —  l(s  +  ajj  <  t ,  Vj))  dKn  (An(s),x)  ,  (2-14) 

ao  Vm* 

and 

Sn(t)  =  n  f  f  l(s  +  Xj  <  t,  Mj)dKn  (An(s),x)  .  (2-15) 

Jo  Jr* 

The  integrals  in  (2.13),  (2.14)  and  (2.15)  are  well-defined  as  a  Stieltjes  integral  for  functions  of 
bounded  variation  as  integrators. 
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2.2  An  FCLT  for  Multiparameter  Sequential  Empirical  Processes 

We  present  an  FCLT  for  multiparameter  sequential  empirical  processes  Un  :=  {Un(t,x)  :  t  >  0,  x  £ 
[0, 1]*}  driven  by  a  sequence  of  i.i.d.  random  vectors  with  uniform  marginals: 

[nt.\ 

Un(t,x)  :=  <x)-H(x)),  t>  0,  xg[0,1]a\  (2.16) 

i= i 

where  for  each  i  E  N,  £*  :=  (£] , . ...  flK)  is  a  vector  of  nonnegative  random  variables  with  continuous 
joint  distribution  function  H(-)  and  uniform  marginals  over  [0, 1]. 

The  convergence  for  the  processes  Un(t,x )  is  established  in  the  space  D([0,  oo),  D([0, 1]A ,  M)). 
We  remark  that  this  theorem  is  in  the  same  spirit  as  Lemma  3.1  in  [28],  where  an  FCLT  is  proved 
for  the  two-parameter  process  Un(t,x )  in  the  univariate  case  in  the  space  D([0,  oo),  D([0, 1],  M)). 
We  generalize  that  result  to  the  multivariate  setting. 

Theorem  2.1.  The  multiparameter  sequential  empirical  processes  Un{t,x )  defined  in  (2.16)  con¬ 
verge  weakly  to  a  continuous  Gaussian  limit, 

Un{t,x)  =>  U(t,x)  in  D([0,  oo),  B([0, 1]A ,  M))  as  n  -»  oo,  (2-17) 

where  U(t.,x )  is  a  continuous  Gaussian  random  field  with  mean  function  E[U(t,x)\  =  0  and  covari¬ 
ance  function 

Cov(U(t,x),U(s,y))  =  (tAs)(H(x  Ay)  -  H(x)H(y)),  t,s>  0,  x,ye[  0,1]A. 

To  show  the  FCLT  for  the  processes  (Xn,Yn,  Sn ),  we  define  the  diffusion-scaled  multiparameter 
sequential  empirical  processes  Kn  :=  { Kn(t,x )  :  t  >  0,16  Ma}  by 

[nt.] 

Kn(t,x)  :=  ^  (l(»7*  <  x)  —  F(x))  ,  t  >  0,  iGlf.  (2-18) 

Vn  , 

1=1 

Theorem  2.1  can  be  applied  to  show  an  FCLT  for  the  processes  Kn(t.,x).  Define  F  :  RA  — »•  [0, 1]A 
with  F(x)  =  (Fi(xi), ...,  Fk(xk))-  By  Sklar’s  theorem  [46],  for  any  multivariate  distribution  func¬ 
tion  F.  there  exists  a  unique  multivariate  distribution  function  H  (called  “copula”)  with  uni¬ 
form  marginals  on  [0,1]  such  that  F(x)  =  H(F(x ))  when  the  marginal  distribution  functions  F^, 
k  =  1  are  continuous.  Then,  Kn(-,-)  can  be  represented  as  a  composition  of  Un(-,-)  with 

F(-)  in  the  second  component,  i.e., 

Kn(t,x)  =  Un{t,F(x)),  t>  0,  xeR*. 

Thus,  it  follows  from  Theorem  2.1  that  the  processes  Kn(t,x)  converge  in  distribution: 

kn(t,x)  =  Un(t,F(x))  =>  K(t,x)  :=  U(t,F{x))  in  D([0,  oo),  D#)  as  n  ^  oo,  (2.19) 
which  implies  that 

Kn(t,x)  =>■  k(t,x)  :=  tF(x)  in  D([0,  oo), R>k)  as  n  -»  oo.  (2.20) 
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2.3  FWLLN  and  FCLT 

We  define  fluid-scaled  processes  Xn,  Yn  and  Sn  by 


Xn  :=  1Xn,  Yn  :=  -Yn,  Sn  :=  -Sn.  (2.21) 

n  n  n 

The  FWLLN  for  (. Xn,Yn,Sn )  is  stated  in  the  following  theorem. 

Theorem  2.2  (FWLLN).  Under  Assumptions  1  and  2,  the  fluid-scaled  processes  converge  to 
deterministic  fluid  functions, 


(. An,Xn,Yn,Sn )  =*►  (a,X,Y,S)  (2.22) 

in  p2A'+2 

as  n  — >•  oo,  where  the  limits  are  all  deterministic  functions:  a  is  the  limit  in  (2.5),  for 
each  t  >  0, 


Xk(t)  :=  /  Ff{t-s)da(s),  for  k  =  l,...,K, 


(2.23) 


Y(t) Yk(t)  :=  /  (Ffflt-s)-Ff(t-s))dd(s),  for  k  =  l,...,K,  (2.24) 

Jo 

S(t)  :=  (  Fm(t  —  s)da(s).  (2.25) 

Jo 

When  a(t)  =  A t  for  a  constant  arrival  rate  A  >  0  and  <  oo  for  k  =  1, ...,  K , 


Xk(oo)  :=  lim  Xk{t)  =  A E[r)\\,  k  =  1, 

t—t  OO 

Yfc(oo)  :=  lim  Yk(t)  =  A -  E[rik]),  k  =  1, ...,  Ji, 

I— >•  OO 

lim  M  _  A. 
t— >oo  t 

We  define  the  diffusion  scaling  of  Xn,  Yn  and  Sn  by 

Xn  :=  y/fl{Xn  -  X),  Yn  :=  y/n(Yn  -  Y),  Sn  :=  y/fl{Sn  -  S). 


(2.26) 

(2.27) 

(2.28) 

(2.29) 


We  will  show  the  following  FCLT  for  these  diffusion-scaled  processes. 

Theorem  2.3  (FCLT).  Under  Assumptions  1  and  2,  the  diffusion- scaled  processes  converge  in 
distribution, 

(. An ,  Kn,Xn,Yn,  Sn )  =►  (i,  K,X,Y,  S)  (2.30) 

in  D  x  D([0, oo), Da')  x  ID)2A+1  as  n  — >•  oo,  where  A  is  the  limit  in  (3.6),  K  is  the  limit  in  (2.19), 
which  is  independent  of  A,  and  for  t  >  0  and  k  =  1, ...,  K, 


X(t) 


Mi(t )  :=  i  =  1,2, 

f  -  s)d4(s)  =  i(t)  -  T  i(«W(t  -  S), 

JO  JO 
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(2.31) 

(2.32) 


Mfcj2(i) 
S(t) 
Vi  (t) 

V2(t) 

Y(t) 

Zk,i(t) 


1  (s  +  xk  >  t)dK(a(s),x)  =  —  /  /  1  (s  +  xk  <  t)dK(a(s),x),  (2.33) 

Jo  J Rf 


Vi(t)  +  V2(t), 

[  Fm(t  -  s)dA(s)  =  -  f  A(s)dFm(t  -  s), 
Jo  Jo 


1  {s  +  Xj  <  t,  \/j)dK(a(s),x), 


Zi(t)  +  Z2(t),  Zi(t)  :=  (Zlti(t), ...,  ZKji(t)),  *  =  1,2, 


(2.34) 

(2.35) 

(2.36) 

(2.37) 


/  (Fk(t-s)-Fm(t-s))dA(s)=  A(s)d(Fm(t  -  s)  -  Fk(t  -  s)),  (2.38) 

/  o  Jo 

f  [  (l(s  +  Xk<t)  —  l(s  +  Xj<t,  Vj))dK(a(s),x)  =  — Mfcj2(i)  -  V2(t)(2.39) 
/ o 


The  processes  M2,  ^2  and  V2  are  defined  in  the  mean-square  sense.  This  is  in  the  same  way 
as  the  limit  process  with  respect  to  a  standard  Kiefer  process  for  the  G/GI/oo  queue  is  defined  in 
[28,  39].  The  limit  processes  are  characterized  in  the  next  subsection. 


2.4  Characterization  of  the  Limit  Processes 

In  this  section,  we  show  the  Gaussian  property  of  the  limiting  processes  (X,Y)  and  S  when  the 
arrival  limit  process  is  a  Brownian  motion. 

Theorem  2.4  (Gaussian  Property).  Under  Assumptions  1  and  2,  when  the  arrival  limit  process 
A  is  a  Brownian  motion,  i.e.,  A(t )  =  caBa(a(t ))  for  a  standard  Brownian  motion  Ba,  a  positive 
constant  ca  >  0  and  t  >  0,  the  limiting  processes  (X,Y)  and  S  in  Theorem  2.3  are  well-defined 
continuous  Gaussian  processes.  For  each  t  >  0, 

(X(t.),Y(t))  =  N(0,Y(t)),  and  S(t)  =  N{0,  as(t)), 
where  for  j ,  k  =  1, ...,  K , 

:=  Cov(Xj(t),Xk(t))  =  [  \F^k{t-s,t-s)  +  (c2a-l)F^(t-s)Ff(t-s)]dd{s),  (2.40) 

Jo  L  J 

o-Jk(t)  ■■=  Cov(Yj(t),Yk(t ))  =  [  \(Fjtk(t  -s,t-s)~  Fm(t  -  s)) 

Jo  L 

+  (cl  -  1  )(Fj(t  -  s)  -  Fm(t  -  s))(Fk(t  -  s)  -  Fm(t  -  s))  da(s),  (2.41) 

vfk  (t)  :=  Cov(Xj(t),  Yk(t ))  =  f  ( Fk(t  -  s)  -  Fj:k(t  -s,t-  s)) 

Jo  L 

+  (c2a  -  1)  {F?(t  -  s)(Fk(t  -s)-  Fm(t  -  s)))  ]  dd(s),  (2.42) 

and 

(J S(t)  :=  Var(S(t))  =  [  Fm(t  -  s)da(s)  +  (c2a  -  1)  [  (Fm(t  -  s))2dd(s).  (2.43) 

Jo  Jo 
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When  the  arrival  rate  function  aft)  =  A t  for  a  positive  constant  A  >  0, 


(X(t),Y(t))  =4>  (.X'(oo),  F(oo))  =  N(0,  X(oo))  as  t oo, 
lim  t~1Var(S(t))  =  A c2, 

t— >oc 

where  for  j,  k  =  1, K , 


roc  roc 

pfk{°°)  :=  A  /  Fj,k(s>  s)ds  +  x(cl  -  !)  /  Fj(s)Ff(s)ds, 
Jo  Jo 


Ojfc(°°)  ■=  X  I  (Fj,k(s -  s)  -  Fm(s))  +  (c£  -  1)(-Fj(s)  -  Fm(s))(Fk(s )  -  Fm(s))  ds, 


^\y(oo)  :=  A 


(Ffc(S)  -  Fjjk(s,  s))  +  -  1)  (Ff(s)(Fk(s)  -  Fm(s ))) 


ds. 


We  make  the  following  remarks  on  the  Gaussian  property  of  the  limiting  processes. 


(2.44) 

(2.45) 

(2.46) 


(i)  When  we  set  (?a  =  1,  the  variance  and  covariance  formulas  coincide  with  those  in  the  Poisson 
arrival  case  in  Proposition  2.1. 

(ii)  When  K  =  2  and  c2a  =  1,  Cov(Yj(t),Yk(t ))  =  0,  for  t  >  0  and  k,j  =  with  k  /  j, 

even  if  the  service  times  of  parallel  tasks  are  correlated,  since  both  terms  inside  the  integral 
in  (2.41)  vanish. 


(iii)  We  emphasize  the  interesting  structure  of  the  variances  of  Xk  and  Yk  and  their  covariances, 
k  =  1,  Recall  that  for  G/GI/oo  queues  [28],  the  steady-state  variance  formula  of  the 

number  of  jobs  in  the  system  is  given  as  the  sum  of  two  terms,  the  mean  and  the  coefficient 
( c \  —  1)  multiplying  an  integral  associated  with  the  service  time  distribution;  for  example, 
when  E[rjfr]  <  oo,  the  variance  of  the  steady-state  number  of  tasks  in  the  kr,h  service  station 
is 


r  oo 

Var(Xk( oo))  =  A E[r)\]  +  A (<£  -  1)  /  ( Ff(s))2ds ,  k  =  1, ...,  K. 

Jo 

It  turns  out  that  the  steady-state  variance  formula  for  the  number  of  tasks  in  the  waiting  buffer 
for  synchronization  has  the  same  structure;  for  instance,  when  E[rjl]  <  oo  for  k  =  1 
the  variance  of  the  steady-state  waiting  buffer  size  at  the  kth  service  station  is 


Y  ar(Yk(oo)) 


X(E[t,1\-E[t,1])  +  X(4 


1)  (F^(s)-Ff(s))2ds, 


k  =  1 


The  same  structure  also  exists  for  the  covariances  between  Xj  and  Yk,  as  shown  in  (2.42),  for 
k,j  =  1,  -,K. 

(iv)  The  synchronized  process  does  not  have  a  Brownian  motion  limit,  but  its  limiting  process  is 
Gaussian,  and  has  the  same  variability  as  the  arrival  process  when  the  arrival  rate  is  constant. 
This  can  be  also  explained  by  regarding  the  synchronized  process  as  the  departure  process 
of  a  G/GI/oo  queue  with  the  same  arrival  process  and  service  times  as  the  maximum  of  the 
service  vectors  (see  [28,  39]). 
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To  explore  the  impact  of  the  correlation  among  the  service  times  of  each  job’s  parallel  tasks 
on  the  system  dynamics,  we  consider  the  case  when  the  service  vector  if  has  the  joint  continuous 
distribution  function 

F(x)  =  (1  -  p)  G(xk)  +  pG  (  jnin  {a*}')  (2.47) 

with  a  marginal  continuous  distribution  function  G(-),  for  0  <  p  <  1,  xk  >  0  and  k  =  1 
Namely,  the  service  times  at  the  parallel  stations  have  the  same  distribution,  and  are  symmetrically 
correlated  with  a  correlation  parameter  p  £  [0, 1).  We  state  the  mean  and  covariance  functions  of 
the  performance  measures  studied  above  as  functions  of  the  parameter  p  in  the  following  corollary. 

Corollary  2.1.  Under  the  same  assumptions  in  Theorem  2.4,  when  the  service  vector  rf  has  the 
joint  distribution  function  F  in  (2.47),  for  each  t  >  0  and  k  =  1, ...,  K,  Xk(t )  and  V  ar{Xk{t ))  are 
the  same  as  in  (2.23)  and  (2.40),  respectively, 

Yk(t)  =  (1  -  p)  [  [( Gift  -  s)(l  -  (G(t  -  s))A_1)]  dd(s), 

Jo 

Var(Yk(t ))  =  f  [(1  -  p)G(t  -  s)(l  -  (G(t  -  s))^1) 

Jo  L 

+  (1  -  p)2(c2a  ~  1  )(G(t  -  s))2  (1  -  (G(t  -  s))*-1)2  dd(s), 

Cov(Xk(t),Yk(t ))  =  (c2  -  1)(1  -  P)  r  [ Gc(t  -  s)G(t  -s)(  1  -  (G(t  -  s))^1)]  dd(s), 

Jo 

for  j,  k  =  1, ...,  K  and  j  /  k, 

CoV(Xj(t),Xk(t ))  =  f  [(1  -  p)(Gc(t  -  s))2  +  pGc(t  -s)  +  (c2  -  l)(Gc(t  -  s))2lda(s), 

Jo  1  J 

CoufawMt))  =/*[(!-  P)(G(t  -  s))2(  1  -  (G(t  -  s))K~2) 

Jo  L 

+  (1  -  p)2(c2a  -  1  ){G(t  -  s ))2  (1  -  (G{t  -  s))7^1)2  dd(s), 

CoviXjft ),yk(t))  =  (1  -p)  f  fc(t  -  s)Gc(t  -  s ) 

Jo  L 

+  (c2  —  1  )Gc(t  —  s)G(t  —  s)  (l  —  (G(t  —  s))*-1)  da(s), 

and 

S(t)  =  f  (1  -  p)(G(t  -  s))K  +  pG(t  -  s)  da(s), 

Jo  1  J 

Var(S(t))=  f  [(1  -  p)(G(t  -  s))R  +  pG(t  -  s)]  da(s) 

Jo 

Tic2-1)  /  [(1  -  p)(G(t- s))K  +  pG(t- s)]2  dd(s). 

Jo 
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We  make  several  remarks  on  the  impact  of  the  correlation  among  the  service  vector.  The  mean 
and  the  variance  of  X^it)  are  not  affected  by  the  correlation,  but  the  covariances  of  Xj(t )  and  Xk{t) 
increase  linearly  in  p  for  t  >  0  and  j,  k  =  1,  ...K  with  j  7^  k.  The  mean  of  Tfc(t)  decreases  linearly  in 
p  and  the  mean  of  S(t)  increases  linearly  in  p  for  t  >  0  and  k  =  1, K.  The  covariances  of  Yj(t)  and 
Yk(t)  decrease  in  p,  in  the  order  of  (1  —  pj2,  but  the  covariances  of  Xj(t)  and  Yfc(f)  decrease  linearly 
in  p  for  t  >  0  and  j,k  =  1,  ...K .  The  variance  of  S(t)  increases  in  p.  in  the  order  of  p2,  for  t  >  0. 
The  intuitive  interpretation  for  these  observations  is  that  positive  correlation  makes  the  parallel 
tasks  more  likely  to  finish  close  to  each  other  so  that  the  waiting  time  for  synchronization  becomes 
less  and  more  jobs  are  synchronized.  It  is  also  important  to  emphasize  that  the  covariances  of  Yj[t) 
and  Yk(t)  and  the  covariances  of  Xj(t)  and  Y^(t)  decrease  in  different  orders  in  the  correlation 
parameter  p  for  t  >  0  and  j,  k  =  1,  ...K.  The  same  observations  hold  for  the  associated  steady-state 
performance  measures. 


2.5  Comparison  with  a  fork-join  network  with  ES 


We  make  a  comparison  with  an  associated  fork-join  network  with  ES.  We  use  superscript  “ES”  in 
the  corresponding  processes  for  this  model.  Let  the  arrival  and  service  processes  be  the  same  as 
the  model  described  above.  The  only  difference  is  the  synchronization  constraint.  Here  tasks  are 
not  tagged  with  a  particular  job,  so  that  whenever  there  are  tasks  completed  at  all  parallel  service 
stations,  the  oldest  completed  task  at  each  waiting  buffer  for  synchronization  will  be  synchronized. 
It  is  evident  that  when  the  arrival  process  A(t)  is  Poisson,  the  processes  YES(t)  and  SES(t)  do  not 
have  a  Poisson  distribution  at  each  time  t  >  0,  k  =  1  In  this  case,  for  each  k  =  1 

X'k’ES  and  jj'^ES  wiH  have  the  same  representations  as  in  (2.6)  and  (2.9),  but  the  processes  Sn,ES 
and  y£’ES  become 

Sn’ES(t)  =  min  {Dn'ES{t)},  t>  0,  (2.48) 

1<J<K  J 

and 

Y?ES(t)  =  Dnk'ES(t)  -  Sn'ES(t)  =  Dnk’ES(t)  -  min  {D]’ES(t)},  t  >  0.  (2.49) 

Thus,  at  any  time,  one  of  the  waiting  buffers  for  synchronization  should  be  empty.  It  is  evident  that 
the  processes  Sn'ES  and  Yk'ES  cannot  be  represented  as  a  single  integral  of  the  multiparameter 
sequential  empirical  process  Kn  as  in  equations  (2.15)  and  (2.14),  respectively. 

We  now  discuss  more  on  the  comparison  for  the  steady-state  mean  values  of  the  fluid  limits  of 
these  processes  when  the  arrival  rate  is  constant.  In  the  ES  model,  the  synchronization  process 
SES  can  be  represented  as  the  minimum  of  the  departure  processes  from  all  parallel  stations,  and 
these  departure  processes  are  dependent  due  to  the  correlation  of  service  vector  of  each  job.  Thus, 
we  are  unable  to  obtain  a  distributional  approximation  of  the  processes  SES  and  YES ,  k  =  1 
However,  for  each  t  >  0,  by  applying  the  previous  results  on  G/GI/oo  queues  [28],  we  can  obtain 
the  mean  values  of  the  fluid  limit  YES(t),  k  =  1, ...,  K,  and  SES(t ): 


YES(t)  :=  A 


[  Fk(s)ds  -  min  j  f  Fj(s)d, 

Jo  i<?<*  [J 0 


u  0 
YES(  oo] 


:=  A  I  max .{E[rj]}}  -  E[r,\] 


\1  <j<K 


as  t  — >  00, 


(2.50) 
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SES  (t) 


A  min  /  [  Fj(s)ds\  =  Xt  —  X  max  /  [  Ff(s)ds\  , 

1<3<k{J0  j  l<j<K  l./o  ^  J 


and 


lim 

t—}  OO 


SB5(t) 

t 


=  X. 
(2.51) 


Recall  that  the  steady-state  mean  value  of  the  waiting  buffer  for  synchronization  in  our  model 
Yk{ oo)  =  A  {E[Vl}  -  E[r,l})  in  (2.27),  k  =  1  denoted  as  Y^ES{ oo)  for  the  comparison 

purpose.  It  is  evident  that  the  average  waiting  buffer  sizes  for  synchronization  under  NES  constraint 
are  larger  than  those  under  ES  constraint,  even  though  the  total  synchronization  throughput  rates 
are  the  same,  lim^oo  SES(t)/t  =  lim^oo  SNES(t)/t  =  A.  We  also  observe  that  when  the  parallel 
service  times  are  perfectly  positively  correlated,  the  difference  Y^ES( oo)  —  YES{ oo)  becomes  zero 
for  k  =  1 ,  ...,if.  We  summarize  this  comparison  result  in  the  following  proposition. 

Proposition  2.2.  Under  Assumptions  1  and  2,  when  a(t)  =  Xt  for  a  positive  arrival  rate  A  >  0 
and  E[r /£]  <  oo  for  k  =  1, ...,  K , 

Y*ES(o o)  -  Yes(oo)  =  X {E[ril]  ~  ^a: x_{£[r/j]})  >  0,  for  k  =  1, ...,  K.  (2.52) 

By  the  extreme  value  theory,  if  the  service  vector  has  i.i.d.  components  such  that  the  service 
time  distribution  lies  in  the  domain  of  attraction  for  Gumbel  extremal  distribution,  then  we  have 
aK (dm  ~  ^k)  Z  as  K  — >  oo,  where  Z  has  a  Gumbel  distribution,  and  ax  and  bx  are  constants 
depending  on  K ;  see  Chapter  1  in  [31].  The  Gumbel  distribution  has  cdf  P(Z  <  z)  =  e~e  “ ,  z  >  0, 

with  mean  E[Z]  =  7  ~  0.5772,  the  Euler-Mascheroni  constant,  and  variance  Var(Z)  =  7r/\/6  ~ 

1.2825.  For  one  example,  if  the  service  vector  has  i.i.d.  components  of  an  exponential  distribution 
with  rate  1,  then  ax  =  1  and  bx  =  In  if  (see  Example  1.7.2  of  [31]),  for  k  =  1, ...,  K, 

YkNES( 00)  -  yfcES(oo)  =  A  l  -  l)  w  A(ln(^)  -  !)  as  ^  oo-  (2-53) 

For  another  example,  if  the  service  vector  has  i.i.d.  components  of  a  lognormal  distribution 
LN( 0, 1),  we  have,  for  k  =  1, ...,  K, 

YkrES(oo) —  YES(oo)  &  X^/ax +  bx  —  e1/2)  as  I<  — >  00,  (2-54) 

where  ax  and  bx  are  (see  Example  1.7.4  of  [31]): 

ax  =  (2  In if)1//2  exp  j  —  (2  In K)1^2  +  0.5(21nif)-1/2(lnlnif  +  ln(47r))|  , 

and 

bx  =  exp  |(21nil')1/2  —  0.5(2  In  if) _1/2  (In  In  if  +  ln(47r))|  . 


2.6  Numerical  Example 

In  this  section,  we  provide  a  numerical  example  with  two  parallel  tasks  (if  =  2),  comparing  our 
approximations  with  simulations.  We  let  the  arrival  process  be  renewal  with  arrival  rate  A  =  100 
and  the  SCV  c2  =  5.  The  service  times  of  the  two  parallel  tasks  are  assumed  to  be  a  bivariate 
Marshall-Olkin  hyperexponential  distribution,  which  is  a  mixture  of  two  independent  bivariate 
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Marshall-Olkin  exponential  distributions  [34].  A  bivariate  Marshall-Olkin  exponential  distribution 
function  Fmo(xiV)  for  the  random  vector  (X,  Y)  can  be  written  as  F^IO(x,y)  :=  P(X  >  x,Y  > 
y)  =  exp(— fi\x  —  H2 y  —  Unix  V  y)),  x,  y  >  0,  where  three  parameters  p.i,  p,2,  H12  are  such  that  the 
two  marginals  are  exponential  with  rates  fi\  +  /J\2  and  p/2  +  H12  and  their  correlation  p  =  pi2/(pi  + 
^2  +  ^12)  £  [0,1].  We  denote  MO(Ai,  A2,  p)  for  a  bivariate  Marshall-Olkin  exponential  distribution, 
where  Ai  and  A2  are  the  rates  for  the  marginals,  and  p  is  the  correlation  parameter,  for  which  the 
parameters  p\  =  (Ai  -  pA2)/(l  +  p),  P2  =  (A2  —  yoAi)/(l  +  /?)  and  p\2  =  (p(Ai  +  A2))/(l  +  p).  In  the 
numerical  example,  we  take  a  mixture  of  MO( 4/5, 1,  p\ )  with  probability  0.4  and  MO( 6/5, 6/5,  P2) 
with  probability  0.6,  such  that  the  means  of  the  two  hyperexponential  marginals  are  mSi  1  =  1  and 
mS; 2  =  0.9.  By  setting  pi  =  P2  =  0,  we  have  two  independent  parallel  service  times,  and  by  setting 
pi  =  0.7  and  P2  =  172/679,  we  obtain  that  the  correlation  (see  the  correlation  formula  in  §5.2  [40]) 
between  the  two  parallel  service  times  is  equal  to  0.5. 

In  Table  1,  we  show  the  approximation  values  for  the  mean,  variance  and  covariance  of  X k 
and  Yfc,  for  k  =  1,2,  and  compare  them  with  the  corresponding  simulated  values.  To  estimate 
the  simulated  values,  we  simulated  the  system  up  to  time  40  with  4000  independent  replications 
starting  with  an  empty  system,  which  we  call  one  experiment.  In  each  replication,  we  collected  data 
over  the  time  interval  [20, 40]  and  formed  the  time  average  (the  system  tends  to  be  in  steady  state 
in  less  than  5  time  units).  We  conducted  5  independent  experiments  and  took  sample  averages  as 
estimations  for  simulated  values.  To  construct  the  95%  confidence  interval  (Cl),  we  used  Student 
t-distribution  with  four  degrees  of  freedom.  The  halfwidth  of  the  95%  Cl  is  2.776.ss/ y/5,  where  S5 
is  the  sample  deviation. 

We  make  several  remarks  for  the  numerical  example.  First,  our  approximations  match  very  well 
with  the  simulated  values.  Second,  the  size  of  waiting  buffers  for  synchronization  is  quite  large,  of 
the  same  order  as  the  number  of  tasks  in  the  service  stations.  Third,  we  find  that  when  the  two 
parallel  tasks  are  positively  correlated,  the  mean  and  the  variance  of  X^’s  are  the  same  as  those 
in  the  independent  case,  while  the  covariance  between  X\  and  X2  gets  larger,  the  mean  and  the 
variance  and  covariances  of  Y/’s  and  the  covariances  between  X^  and  Yj  become  smaller  than  those 
in  the  independent  case,  j,k  =  1,2.  These  are  also  consistent  with  the  observations  in  Corollary 
2.1.  Note  that  this  numerical  example  is  more  general  than  that  considered  in  Corollary  2.1. 

3  The  multi-server  fork-join  network  model 

3.1  Model  and  Assumptions 

In  this  section,  we  present  a  detailed  description  of  our  multi-server  fork-join  network  model.  We 
consider  a  fork-join  network  with  a  single  class  of  jobs,  and  each  job  is  forked  into  K  ( K  >  1) 
parallel  tasks.  Each  task  is  processed  in  a  service  station  with  finite  servers  under  the  non-idling 
FCFS  discipline.  Namely,  a  newly  arriving  task  immediately  gets  served  if  there  is  an  idle  server  in 
that  station,  and  joins  the  back  of  the  queue  otherwise,  and  the  task  waiting  for  the  longest  in  the 
queue  enters  service  as  soon  as  a  server  in  that  station  becomes  available.  After  service  completion, 
each  task  will  join  a  waiting  buffer  for  synchronization  associated  with  each  service  station,  and 
when  all  tasks  of  the  same  job  are  completed,  they  will  be  synchronized  and  leave  the  system.  Here 
we  assume  that  the  synchronization  process  takes  zero  amount  of  time. 

Let  A  :=  {A(t)  :  t  >  0}  be  the  arrival  process  of  jobs  after  time  0.  Let  r*  be  the  arrival  time  of 
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Table  1:  Comparing  approximations  with  simulations  in  a  stationary  model 


(Xi,x2) 

(E[X\\,  E[X2]) 

(Var(Xi),  Var(X2)) 

Cov(X  !,X2) 

P  =  0 

Sim.  (95%  CL) 

(99.99  ±  0.17  ,  89.98  ±  0.12) 

(296.26  ±  0.66,  269.46  ±  0.70) 

234.14  ±  0.66 

Approx. 

(100.00,  90.00) 

(296.00,  269.27) 

233.99 

p  =  0.5 

Sim.  (95%  CL) 

(99.98  ±  0.04,  89.99  ±  0.04) 

(296.08  ±  0.57,  269.23  ±  0.80) 

256.34  ±  0.43 

Approx. 

(100.00,  90.00) 

(296.00,  269.27) 

256.30 

(Yi,Y2) 

(£[ii],£[y2]) 

(V'ar(Yi),Var(Ya)) 

Cov(Yi,  Y2) 

p  =  o 

Sim.  (95%  CI.) 

(43.18  ±  0.05  ,  53.20  ±  0.10) 

(70.12  ±  0.20,  89.85  ±  0.40) 

31.53  ±  0.30 

Approx. 

(43.20,  53.20) 

(70.31,  90.08) 

31.55 

p  =  0.5 

Sim.  (95%  CI.) 

(20.89  ±  0.01,  30.88  ±  0.02) 

(27.14  ±  0.15,  42.23  ±  0.35) 

8.36  ±  0.07 

Approx. 

(20.89,  30.89) 

(27.05,  42.23) 

8.31 

(.X,Y) 

CovfX^Yi) 

Cov(  A'i,Y2) 

Cov(X2,Yi) 

Cov(X2,Y2) 

P  =  o 

Sim.  (95%  CI.) 

60.80  (±  0.59) 

122.87  (±  0.61) 

99.21  (±  0.42) 

64.56  (±  0.54) 

Approx. 

61.09 

123.10 

99.85 

64.57 

p  =  0.5 

Sim.  (95%  CI.) 

28.72  (±  0.33) 

68.37  (±  0.73) 

47.51  (±  0.42) 

34.49  (±  0.44) 

Approx. 

28.67 

68.37 

47.41 

34.44 

the  ?'th  job,  i  €  N,  that  is,  Aft)  =  max{i  >  1  :  r*  <  t}  for  t  >  0  and  A(0)  =  0.  Let  Nj~  be  the  number 
of  servers  at  service  station  k,  k  =  Each  job  brings  in  a  /C-dimensional  service  vector, 

representing  the  service  time  at  each  service  station,  which  can  be  correlated.  Let  rf  :=  (rj\, ...,  rfK) 
be  the  service  vector  of  the  job  that  arrives  at  time  Tj,  i  6  N,  where  rf.  is  the  service  time  at  the 
kth  service  station.  We  assume  that  the  sequence  {rf  :  i  >  1}  is  i.i.d.,  and  let  the  joint  distribution 
function  of  rf  be  F(x)  =  F{x\, ... ,xk )  for  xq  >  0,  k  =  1,  ...,K.  Let  Fc{x)  :=  P{rj\  >  aq,  ...,rjlK  > 
xk),  for  x\ ,...,xk  >  0.  Their  marginal  distributions  are  Tfc(-)  with  mean  l/pk  £  (0,  oo),  for 
k  =  1  Let  rfm  :=  maxjjyj, ..., rfK}  and  Fm{ x)  :=  P{rfm  <  x)  =  P{rf-  <  x,Vj)  =  F(x,...,x) 

for  x  >  0.  (Throughout  this  paper,  we  use  “m”  to  index  quantities  and  processes  associated  with 
the  maximum.)  We  make  a  regularity  assumption  on  the  service  time  distributions  for  the  parallel 
tasks. 

Assumption  1.  The  joint  distribution  function  F{x)  of  the  service  time  vector  r]1,  i  6  N,  is 
continuous. 

State  Descriptors.  Let  Xk  :=  {Xk(t)  :  t  >  0}  be  the  process  counting  the  number  of  tasks  at 
the  service  station  k,  and  Yy.  :=  {Yy.(t)  :  t.  >  0}  be  the  process  counting  the  number  of  tasks  in 
the  waiting  buffer  for  synchronization  (unsynchronized  queue)  after  service  completion  at  service 
station  k,  k  =  1  ,...,K.  Denote  X  :=  (X\, ...,  Xk)  and  Y  :=  (Yi, ...,  Yk).  Let  S  :=  { S(t )  :  t  >  0} 
be  the  process  counting  the  number  of  synchronized  jobs  by  each  time  t  >  0.  In  addition,  let 
Qk  ■=  { Qk{t )  ■  t  >  0}  and  Bk  :=  {Bk(t)  :  k  >  0}  be  the  processes  representing  the  queue  length 
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and  the  number  of  tasks  in  service  at  station  k,  respectively,  k  =  1, K.  Let  D :=  {D^ft)  :  t  >  0} 
be  the  cumulative  service  completion  (departure)  process  at  service  station  k,  k  =  1, K.  Denote 
Q  '■=  (Qi,  •••)  Qk)-,  B  :=  (Bi,  ...,Bk),  and  D  :=  (D\, ...,  Dk)- 

A  Sequence  of  Systems.  We  consider  a  sequence  of  the  above  fork-join  networks,  indexed  by 
superscript  n  and  let  n  -A  oo.  We  assume  that  each  service  station  is  operating  in  the  many-server 
heavy-traffic  asymptotic  regimes,  where  the  arrival  rate  of  jobs  and  the  number  of  servers  get  large 
appropriately  while  the  service  time  distributions  are  fixed.  In  establishing  the  FLLN,  we  allow 
the  arrival  rate  to  be  time-dependent.  In  establishing  the  FCLT,  we  will  assume  that  each  service 
station  is  operating  in  the  Haffin- Whitt  (QED)  regime,  so  that  it  is  critically  loaded  with  a  constant 
arrival  rate  (see  Assumption  4  for  the  precise  definition).  For  any  process  A,  we  use  Xn  to  represent 


the  associated  process  in  the  sequence  of  the  fork-join  networks. 

Some  Fundamental  Flow  Balance  Equations.  For  each  service  station  k,  k  =  1  and  for 

each  t  >  0,  we  have  the  following  flow  conservation  equations: 

XZ(t)  =  Bnk(t)  +  QUt),  (3.1) 

XZ(t)=X%(0)  +  An(t)-DUt),  (3.2) 

Y?{t)  =  Y?(0)  +  DW)-Sn{t).  (3.3) 

The  non-idling  condition  implies  that  for  each  k  =  1, ...,  K  and  t  >  0, 

B£(t)  =  XZ(t)ANi,  Ql(i)  =  (Xl(t)  -  NZ)* .  (3.4) 


In  addition,  we  have  the  following  flow  balance  equation,  for  each  k ,  k!  =  1, ...,  K ,  k  /  k! ,  and  t  >  0, 


XZ(t)  +  Ykn(t)  =  Xm  +  Y$(t),  (3.5) 

that  is,  the  total  numbers  of  tasks  in  each  service  station  and  its  associated  waiting  buffer  for 
synchronization  are  equal  at  all  time,  and  are  equal  to  the  total  number  of  jobs  in  the  system. 


3.2  Fluid  Limit 

In  this  section,  we  present  the  fluid  limit  for  the  fork-join  network.  We  assume  that  the  system 
starts  from  empty  and  allow  the  arrival  rate  to  be  time-dependent. 

Assumption  2.  There  exists  a  continuous  nondecreasing  deterministic  real-valued  function  a(t) 
on  [0,  oo)  with  a(0)  =  0  such  that 

An{t )  :=  n~1An(t)  a(t )  in  D  as  n  — >■  oo.  (3-6) 

We  also  make  the  following  assumption  on  the  numbers  of  servers. 

Assumption  3.  For  k  =  1, ...,  K,  NJf  :=  NJf/n  — >  >  0  as  n  — )•  oo. 

Under  the  empty  initial  condition,  we  can  write  the  processes  X£(t),  Yf'(t),  k  =  1, ... ,K ,  and 
Sn(t)  as 

An(t) 

xtm  =  E 1  w + <■' +vi>t),  t  >  o.  (3.7) 

i= 1 
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(3.8) 


An(t) 

Yk(t)  =  1(Til  +  wk’1  +  7lk  ^  t’  Ti  +  +  7fk'  >  t,  for  some  k'  /  k),  t>  0, 

2=1 

An(t) 

<?"(*)  =  X(Ti  +wk’1  +  r?k<t,  V7c  =  1,  t>  0,  (3.9) 

2=1 

where  '(/;?’*  is  the  waiting  time  of  the  ith  arrival  at  station  k,  i£R. 

In  addition,  for  k  =  l,...,/i,  let  Ej£(t)  be  the  number  of  tasks  that  have  entered  service  at 
station  k  by  time  t,  t  >  0,  and  set  E%  :=  {E%(t)  :  t  >  0}.  Denote  En  :=  (E™, ...,  E7^).  For  each 
service  station  k  =  1,  K,  we  also  have  the  balance  equation 

m)  =  An(t)  -  Ql(t)  =  An(t)  -  (XW)  -  NJ})+,  t  >  0.  (3.10) 

Define  the  fluid-scaled  processes  Xn  :=  n~lXn  for  Xn  =  Xn,Yn,Sn,En,Qn,Bn,Dn.  We  now 
state  the  FLLN  for  the  fluid-scaled  processes. 

Theorem  3.1.  Under  Assumptions  1-3, 

(. An,Xn,Yn,Sn,En,Qn,Bn,Dn )  =>  (a,X,Y,  S ,  E ,  Q ,  B ,  D)  (3.11) 

in  p6A>2 

as  n  -A  oo,  where  the  limits  are  all  deterministic  functions:  a  is  the  limit  in  (3.6), 
(. E,X,Y,S )  is  the  unique  solution  to  the  following:  fort  >  0  and  k  =  1 

Xk(t)  =  f  Ff(t  -  s)da(s)  +  f\xk(t  -  a)  -  Nk)+dFk(s),  (3.12) 

io  Jo 

Ek(t )  =  a(t)  -  (Xk(t)  -  Nk)+,  (3.13) 

S(t)  =  [...[(  min  {Ek(t  -  sk)}  \  dF(si,...,sK),  (3.14) 

J0  Jo  \k=l,-,K  J 

Yk(t )  =  T  Fk(t  -  s)dd(s)  -  [\xk(t  -s)-  Nk)+dFk(s )  -  S(t),  (3.15) 

Jo  Jo 

and  the  limits  Q,  B  and  D  satisfy 

Qk(t)  =  (Xk(t)  -  Nk)+,  Bk(t)  =  Xk(t)  A  Nk,  Dk(t)  =  a(t)  -  Xk(t).  (3.16) 

It  is  easy  to  check  that  for  each  k  =  1, ...,  K,  the  limit  Xk(t)  also  satisfies  the  following  equation: 

Xk(t)  =  a(t )  -  [  Ek(t-  s)dFk(s),  t>  0.  (3-17) 

Jo 

When  a(t )  =  X(s)ds  and  the  service  times  are  exponential  (independent  or  dependent),  where 
A(-)  is  a  positive  function,  for  each  k  =  1,  the  fluid  limit  Xk  in  (3.12)  and  (3.17)  becomes  an 
ordinary  differential  equation  (ODE)  [38],  but  the  fluid  limit  Yk  in  (3.15)  does  not  have  an  ODE 
representation.  We  remark  that  the  fluid  limit  Xk  for  each  k  =  1, ...,  K  depends  only  on  the  marginal 
distribution  Fk,  while  the  fluid  limits  Yk,  k  =  1,  and  S  depend  on  the  joint  distribution  F. 
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However,  as  the  FCLT  (Theorem  3.2)  below  shows,  the  limits  for  all  these  processes  in  the  diffusion 
scale  will  depend  on  the  joint  distribution  F. 

When  the  arrival  rate  is  constant  and  each  service  station  is  underloaded  or  critically  loaded,  we 
give  a  corollary  on  the  steady  states  of  the  fluid  limits.  The  proof  follows  from  a  direct  calculation 
and  is  omitted.  It  is  evident  that  correlation  among  service  times  of  parallel  tasks  only  affects  the 
steady  state  of  Y  but  not  that  of  X. 

Corollary  3.1.  Under  Assumptions  1-3,  if  the  arrival  rate  is  constant,  a(t)  =  A t,  for  A  satisfying 
0  <  A  <  Nkpk  for  all  k  =  1, ...,  K , 

(X(t),Y(t),Q(t),B(t))  — >  (X(oo),Y(oo),Q(oo),B(oo))  as  t  — >  oo, 

and 

~(D(t),  E(t),  S(t))  — >  A  :=  (A, ...,  A)  as  t  — >  oo, 

where 

Xk(oo)  =  Bk( oo)  =  A E[ril]  =  X/Uk,  Yk(oo)  =  A {E[r]}n}  -  E[r]l\ ),  Qk( oo)  =  0. 


3.2.1  Numerical  Examples 

We  give  two  numerical  examples  to  show  the  effectiveness  of  fluid  approximations  comparing  with 
simulations,  when  I\  =  2.  We  let  the  arrival  process  be  Poisson  with  time- varying  rate  A (f)  = 
200  +  120sin(f),  t  >  0.  The  numbers  of  servers  in  stations  1  and  2  are  Aq  =  300  and  IV2  =  340, 
respectively.  In  the  first  numerical  example,  the  service  times  of  the  two  parallel  tasks  are  assumed 
to  have  a  bivariate  Marshall-Olkin  exponential  distribution  [34],  For  our  first  numerical  example, 
we  set  the  service  times  to  be  MO(1,0.9,  p)  such  that  the  service  times  of  the  two  parallel  tasks 
have  exponential  marginals  with  means  1  and  10/9  in  stations  1  and  2,  respectively,  and  their 
correlation  is  p.  The  numerical  results  with  p  =  0  and  p  =  0.5  are  provided  in  Figure  3(a),  marked 
with  “ind.”  and  “corr.”,  respectively.  In  the  second  numerical  example,  we  let  the  service  times  of 
the  two  parallel  tasks  have  a  bivariate  Marshall-Olkin  hyperexponential  distribution  [40],  which  is 
a  mixture  of  two  independent  bivariate  Marshall-Olkin  exponential  distributions.  Specifically,  we 
take  a  mixture  of  MO( 4/5, 1,  pi)  with  probability  0.4  and  AfO(6/5,  27/32,  P2)  with  probability  0.6, 
such  that  the  two  parallel  service  times  have  hyperexponential  marginals  with  the  same  means  as 
the  first  example.  By  setting  p\  =  p2  =  0,  we  have  two  independent  parallel  service  times,  and 
by  setting  p\  =  0.7  and  p2  =  521/1232,  we  get  the  correlation  between  the  two  parallel  service 
times  to  be  0.5.  In  Figure  3(b),  we  show  the  numerical  results  with  p  =  0  (“ind.”)  and  p  =  0.5 
(“corr.”).  To  calculate  the  simulated  values,  we  simulated  the  system  up  to  time  20  with  500 
independent  replications  starting  with  an  empty  system.  We  make  two  remarks  from  numerical 
results.  First,  the  fluid  approximations  match  very  well  with  the  simulated  results.  Second,  the 
positive  correlation  among  parallel  service  times  does  not  affect  Xk,  but  reduces  Yk,  for  k  =  1,2. 

3.3  FCLT  in  the  Halfin- Whitt  regime 

In  this  section,  we  study  the  fork-join  network  with  NES  in  the  Halfin- Whitt  regime,  which  requires 
that  each  service  station  operates  in  a  critically  loaded  regime  asymptotically.  Specifically,  we 
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Figure  3:  Comparison  of  fluid  approximations  with  simulations. 

assume  the  following.  Let  Xn  be  the  arrival  rate  of  jobs  such  that  Xn  :=  X n/n  — >  X  >  0  as  n  — >  oo, 
and  set  NJ}  :=  nNk ,  where  Nk  G  N,  and  :=  Xn/(fikNJ})  for  each  k  =  1, 

Assumption  4.  For  each  k  =  1, ...,  K,  X  =  Nk^k  and,  y/re(  1  —  pk)  — >•  >  0,  as  n  — >•  oo. 

The  arrival  processes  An  =  {An(t)  :  t  >  0}  satisfy  an  FCLT. 

Assumption  5.  There  exists  a  stochastic  process  A  with  continuous  sample  paths  satisfying 

An(t)  —  \nt 

An(t)  := - -j= - =>•  Aft)  in  D  as  n  — >  oo.  (3.18) 

\Jn 

It  follows  from  (3.18)  that  we  have  the  associated  FLLN: 

An(t)  =>  At  in  D  as  n  — >  oo.  (3.19) 

We  now  describe  the  non-empty  initial  conditions.  Due  to  the  complexity  from  initial  conditions, 
we  focus  on  the  case  of  K  =  2,  but  our  approach  can  be  extended  to  K  >  2.  For  convenience,  we 
use  the  notation  k'  to  denote  its  counterpart,  i.e. ,  k'  =  1  {k'  =  2,  respectively)  if  k  =  2  (k  =  1, 
respectively),  for  k  =  1,  2.  At  time  0—,  there  are  A^(0)  tasks  at  service  station  k,  and  Y£( 0)  tasks 
in  its  associated  waiting  buffer  for  synchronization,  for  k  =  1,2.  Let  An(0)  :=  (X™(0),  Aj? (0))  and 
Fn(0)  :=  (T1n(0),T2n(0))-  Recall  the  flow  balance  equation  (3.5).  At  time  0—, 

-X'fc(O)  +  Ykn( 0)  =  X£(0)  +  yfc7( 0),  k  =  1,  2,  (3.20) 

which  is  equal  to  the  number  of  jobs  in  the  system.  Note  that  AT(0)  >  171(0)  for  each  k  =  1,  2, 
since  tasks  in  the  waiting  buffer  associated  with  station  k!  for  synchronization  must  be  in  station 
k,  either  in  service  or  in  queue.  Let  Bk{ 0)  :=  min(Xf(0),Nk)  and  Qk( 0)  :=  (A^)(0)  —  NJ?)+  be  the 
number  of  tasks  in  service  (busy  servers)  and  the  queue  length  at  station  k  at  time  0—,  respectively, 
k  =  1,2.  We  also  assume  that  171(0)  <  Bk( 0)  for  k  =  1,2.  This  is  not  a  restrictive  assumption, 
because  in  the  Halfin- Whitt  regime,  waiting  times  for  service  at  each  station  are  0(1 /y/n)  and 
service  times  are  0(1),  and  jobs  that  have  completed  tasks  in  one  station  and  joined  its  waiting 
buffer  for  synchronization  have  their  associated  tasks  receiving  service  in  the  other  station  with 
probability  one  asymptotically. 
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Let  Jn( 0)  :=  minfc=12{-B^(0)  —  1^(0)}  be  the  number  of  jobs  whose  both  tasks  are  in  service 
at  time  0—.  Then  Zk( 0)  :=  Bk( 0)  —  Y$( 0)  —  Jn(0)  represents  the  number  of  jobs  in  the  system  at 
time  0—  whose  task  k  is  in  service  but  whose  task  k'  is  in  queue  waiting  for  service,  k  =  1,2.  Let 
In( 0)  :=  <3” (0)  AQ2  (0)  be  the  number  of  jobs  (if  any)  whose  both  tasks  are  in  queue  at  their  service 
stations  at  time  0—.  Then  Rk(0)  :=  Qk( 0)  —  In( 0)  represents  the  number  of  jobs  (if  any)  whose 
task  k  is  waiting  in  queue  for  service  while  whose  task  k'  is  in  service,  k  =  1,2.  (Note  that  our 
assumption  above  implies  that  if  a  job  is  waiting  in  queue  at  station  k ,  its  parallel  task  can  be  either 
in  queue  or  in  service  at  station  k! .)  By  our  definition,  we  can  see  that  Zk( 0)  =  Rk,( 0),  fc  =  1,2.  Set 
Rn( 0)  :=  (i?”(0),  i?2  (0))  and  Zn( 0)  :=  (Z”(0),  Z^  (0)).  We  also  obtain  a  decomposition  for  Xk(0): 

xm  =  BU 0)  +  QU 0)  =  Y£( 0)  +  Jn(0)  +  Z%( 0)  +  /n(0)  +  Rnk{ 0),  k  =  1,  2.  (3.21) 

We  let  {wk’1  :  i  =  1,...,QJJ(0)}  be  the  sequence  of  remaining  waiting  times  of  the  tasks  in 
station  k  at  time  0—,  k  =  1,2.  It  is  in  the  order  of  their  positions  in  queue:  uu’1  is  the  remaining 

waiting  time  of  the  task  in  the  front  of  the  queue  while  wk  k  is  that  for  the  task  in  the  end 
of  the  queue  at  station  k  at  time  0—,  k  =  1,2.  Let  {fjlk  :  i  =  1, ...,  £>£(0)}  be  the  sequence 
of  remaining  service  times  of  the  tasks  in  station  k  at  time  0—,  for  k  =  1,2.  Let  {j]k  '■  i  = 
1,  ...,Q))(0)}  be  the  sequence  of  service  times  of  the  tasks  in  station  k  that  are  in  queue  at  time 
0— ,  k  =  1,2.  Without  abuse  of  notation,  we  use  {fjkYk  :  i  =  1, ...,  Yj!i(0)},  {%’J  :  i  =  1, ...,  Jn(0)} 
and  {ffkZ  :  i  =  1, ...,  Zk( 0)},  which  are  partitioning  subsets  of  {fjk  :  i  =  1, ...,  Bk{ 0)},  to  represent 
the  remaining  service  times  of  the  tasks  in  station  k  at  time  0—  corresponding  to  the  quantities 
Yk,{ 0),  Jn( 0)  and  Zk( 0),  respectively,  k  =  1,2.  Similarly,  we  use  {w7^’1’1  :  i  =  l,...,/n(0)}  and 
{wk,l’R  :  i  =  1, ...,  /?-(())},  which  are  partitioning  subsets  of  {wk)l  :  i  =  1, ...,  Qk( 0)},  to  represent  the 
remaining  waiting  times  of  the  tasks  in  station  k  at  time  0—  corresponding  to  the  quantities  In(0) 
and  Rk(0),  respectively,  k  =  1,2.  Finally,  we  use  {rfk  :  i  =  1, ...,  In(0)}  and  {//(,' R  :  i  =  1, ...,  i?£(0)}, 
which  are  partitioning  subsets  of  {rfk  :  i  =  1, ...,  Qk(0)},  to  represent  the  service  times  of  the  tasks 
in  station  k  corresponding  to  the  quantities  In( 0)  and  Rk(0)  in  queue  at  time  0—,  respectively, 
k  =  1,2.  We  assume  that  these  initial  quantities  are  independent  of  the  arrival  process  An  and  the 
service  times  of  new  arrivals  after  time  0. 

We  can  now  give  a  representation  for  the  processes  Xn,  Yn  and  Sn:  for  t  >  0  and  k  =  1,2, 


xm 


Sn{t) 


and 


>2n(  0)  Yj”(0)  J"(  0) 

Y  1(fiiYl  ^  *)  +  1^2  -  1(^,J  ^  w) 

i= 1  i= 1  i= 1 

z?(  0)  Z?(0) 

'  /  j  ±\'l  1  —  b  w2  ^  12  -  y  *-\wl  w  b  '/2 

i=l  j=l 

J"(0)  A™  (t) 

+  Y  Mw]’*’1  +  Vj1  <t,Vj)+  Y  1(^Ti  +  WT  +  ^  V^)’ 

2=1  2=1 


(3.23) 

<*) 


ifcn(t)  =  l?(0)+^n(0)  +  ^(i)-^(t)-^(t).  (3.24) 
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We  use  the  convention  that  X^=i  =  0  throughout  the  paper. 

We  impose  the  following  assumptions  on  the  initial  quantities. 

Assumption  6.  There  exists  (Yi(0),  Y^O))  G  such  that 

(An(0),yn(0))  :=n_1(jrn(0),yn(0))  =*►  (X(0),r(0))  in  M4  as  n  ->  oo, 

whereX(  0)  :=  (N\,  N2)  andY(0 )  :=  (Yi(0),  ^(O)).  There  exist  random  vectors  X  (0)  :=  (Xi(0),  ^2(0))  G 
M2  andY( 0)  :=  (Yi(0),  ^(O))  G  M2  such  that 

(Xn(0),yn(0))  :=  Vn(Xn(0)  -X(0),yn(0)  -y(0))  =*►  (X(0),y(0))  in  M4  as  n^oo. 


This  assumption  implies  that  the  associated  fluid-scaled  initial  quantities 

(Jn(0),Zn(0),In(0),Rn(0))  :=  n~\jn(0),Zn(0),In(0),Rnm  =*►  (J(0),Z(0),7(0),  Jj(0)) 

in  M6  as  n  — >  00,  where 

J(0)  :=  Ni  -  y2(0)  =N2-  y(0),  Z( 0)  :=  (Zi(0),Z2(0))  :=  (0,0),  7(0)  :=  0,  R(0)  :  =  (0,0). 

Define  the  associated  diffusion-scaled  quantities  (Jn(0),Z  (0),7"(0),i?  (0))  by 

H 0)  :=  Jn(0)~_nJ(0),  4"(0)  :=  Zf),  *(0)  :=  ^(0):=^,  7  =  1,2. 


Then  Assumption  6  implies  that 

>(0),zn(0),7”(0),if(0)}  =*►  (j(0),z(0),7(0),fl(0)) 


m 


as  n  — >  00, 


where 


7(0)  :=  min{-(lfc(0))-  -  ^(0)},  Zk{ 0)  :=  -(Xfc(0))“  -  yfc,(0)  -  7(0),  k  =  1,2, 

k= 1,2 

7(0)  :=  min  (Afc(0))+,  Rk( 0)  :=  (Xfc(0))+  -  7(0),  7  =  1,2. 

k= 1,2 

Let 

1  k 

Fk,e(t)  '■=  TTTT  /  Fk(s)dsi  t 

7o 

be  the  equilibrium  distribution  associated  with  F&,  k  =  1,2. 

Assumption  7.  For  k  =  1,2,  {//)).  :  i  G  N}  is  a  sequence  ofi.i.d.  random  variables  with  distribution 
Fk  e  and  for  each  i  G  N,  tj)  and  ff2  are  independent.  {rfk  :  i  G  N}  is  a  sequence  of  i.i.d.  random 
variables  with  distribution  Fk  for  each  i  G  N  and  k  =  1,2.  {{rff1 ,  r^’7)  :  i  G  N}  is  a  sequence  of  i.i.d. 
random  vectors  with  a  joint  distribution  F(-,-).  {(^’  ,  77^.’,  )  :  i  G  N}  is  a  sequence  of  i.i.d.  random 
vectors  with  independent  components,  7  =  1,2. 

Finally,  we  also  make  an  assumption  for  the  residual  waiting  times  {w^  :  i  =  1, ...,  Qk(0)}, 
k  =  1,  2. 
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Assumption  8.  The  residual  waiting  times  of  the  tasks  in  queue  {w£l  :  i  =  1, Qf  (0)},  k  =  1,  2, 
converge  to  zero  a.s.  as  n  — >  oo. 


We  define  the  diffusion-scaled  processes  X  :=  (Xf,  Xf),  Y  :=  (Y™,  V2n)  and  Sn  by 

t  >  0,  (3.25) 


y«{t):=K WzSCW, 


n 


n 


n 


for  k  =  1, 2,  where 


t  rt 


Sn(t )  :=  nS°(t)  +  A" 


((f  -  si)  A  (t  -  s2))  dF(si,  s2), 


o  J  o 


5°(i)  :=  ^(0)FiiC(t)  +  Y1(Q)F2,e(t)  +  J(0)Fi,e(f)F2,e(t), 
*Z*(t)  :=  nYk( 0)  +  A'T  -  S"(t). 

From  the  balance  equation  for  Yjf  in  (3.24),  we  can  rewrite  Yff  as 


(3.26) 

(3.27) 

(3.28) 


Y£(t)  =  Y£(0)  +  X%(0)  +  An(t)  -  X%(t)  -  Sn(t),  t>  0,  A:  =  1,2.  (3.29) 

Recall  E'u(t)  is  defined  as  the  cumulative  number  of  tasks  entering  service  by  time  t  >  0  at 
station  k.  k  =  1,  2,  assuming  the  system  starts  empty  in  §3.2.  Without  abuse  of  notation,  in  §3.3 
related  to  the  FCLT,  we  let  Ef(t)  be  the  number  of  new  arrivals  after  time  0  whose  task  k  has 
entered  service  by  time  t  >  0  at  station  k,  k  =  1,2. 

^  77  ^  71  ^  71  ^  71  ^  71  ^  ^  71  /s  ^  71 

Define  the  diffusion-scaled  processes  (E  ,Q  ,B  ,D  ),  E  :=  (Ef ,  Elf ),  Q  :=  (Qf,  Q%),  B  := 
(Bf,Bf)  and  D  :=  (Df,Df),  by 

ZT'n/'j-A  _  \n.L 

Hit)  -  — •  Qtt)  -  (mw,  m) ■■= -(Hm)-, 

y/U 

DW):=XZ(0)  +  An(t)-X%(t),  t>  0,  As  =  1,2.  (3.30) 


For  si,  s2  >  0,  let 

Sn(s i,a2)  :=  -^((^(Si)A^(S2))-An(Sl  As2)) 

Vn 

=  (^r(si)  +  (Ara/ y/nj (si  —  si  A  s2))  A  (Elf  (s2)  +  (An/ \/n)(s2  —  si  A  s2)). (3.31) 

Before  we  present  the  FCLT  for  the  fork-join  network  with  NES  in  the  Halfin- Whitt  regime, 
we  provide  some  preliminaries  for  the  limit  processes.  The  limit  processes  will  be  functionals  of 
a  generalized  multiparameter  Kiefer  process,  as  a  limit  of  the  multiparameter  sequential  empirical 
process  driven  by  the  service  time  vectors  of  new  arrivals.  Define  the  multiparameter  sequential 
empirical  processes  Kn  :=  {Kn(ti,  t2,  x)  :  t\  >  0,  i2  >  0,  x  G  by 

[nti\A[nt2\ 

Kn(t  i,t2,x):=-=  V  (1  (rii<x)-F(x)).  (3.32) 

y/n  t—* 

2=1 

We  prove  the  convergence  of  Kn  in  the  space  D([0,  oo)2, D([0,  oo)2,  M))  endowed  with  a  generalized 
Skorohod  J\  topology  defined  in  [18]  in  Proposition  3.1. 
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Proposition  3.1.  Under  Assumption  1 


Kn(ti,t2,x)  =>•  K{t\,t2,x)  in  D([0,  oo)2,  D([0,  oo)2, M))  as  n  — >  oo,  (3.33) 

where  K{t\,t2,x)  is  a  continuous  Gaussian  random  field,  called  a  generalized  multiparameter  Kiefer 
process,  with  mean  E[K(ti,t2,x)]  =  0  and  covariance  function 

Cov(K(s1,s2,x),K(ti,t2,y))  =  (si  A  s2AtiAt2)(F(x/\y)  -  F(x)F(y)),  (3.34) 

for  Sk ,  tk  >  0,  k  =  1, 2,  and  x,y  G  M+. 

We  define  the  processes  Wk  :=  {Wk{t)  '■  t  >  0},  Wf  :=  {W^(t)  :  t  >  0}  and  W  :=  {W(t)  :  t  >  0} 
as  integral  functionals  of  K :  for  f  >  0,  k  =  1,2, 


A  rt 


Wk(t)  :=  /  /  /  l(sfc  <  t)dK(Xsi,  Xs2,x), 

Jo  Jo  Jr i 


Vh(t)  :=  /  /  /  l(sj  +  Xj  <tyj)dK(\si,Xs2,x), 

1 0  JO 


(3.35) 

(3.36) 


and 


Wf(t)  :=Wk{t) -W(t)  =  (  j  I  l(sk  +  xk  <  t,ski  +xkf  >  t)dk(Xsi,Xs2,x),  (3.37) 

Jo  Jo  Jr\ 

where  the  integrals  are  defined  in  the  sense  of  mean-square  limits  (see  the  precise  definition  in  §??). 

Proposition  3.2.  The  processes  Wk,  Wf  and  W  are  well-defined  continuous  Gaussian  processes 
with  mean  zero,  and  for  0  <  s  <  t  and  k  =  1,2, 

E[(Wk(t)  -  Wfc(s))2]  =  A  f\ Fk(t  -u)-  Fk(s  -  u))(l  -  Fk(t  -  u)  +  Fk(s  -  u))du , 

Jo 


rt  rt 


E[(W (t)  -  W{s)Y ]  =  X  /  [A F((s  -  si ,  s  -  s2);  (t  -  sut  -  s2))] 

Jo  Jo 

x  [1  -  AF((s  -S!,S-  s2);  s2))]d(si  A  s2), 


(3.38) 


E[{Wf{t)  -  Wf{s)Y]  =  E[(Wk(t )  -  Wk(s)Y]  +  E[(W(t )  -  W(s))^ 


-  2A  /  /  [F(f  -  si,t  -  s2)  -  Ffc,fc'(s  -  Sfc,t  -  sfc/) 

Jo  Jo 

+  (Pfe(i  -  Sk)  -  Fk(s  -  Sk))(F(s  -  si,s  -  s2)  -  F(t  -  si,t  -  s2))]d(si  A  s2), 
and  covariance  functions 

Cov{Wk{t),Wk'{t))  =  X  f  f  [F(t  -  si,t  -  s2)  -  Fk(t  -  sk)Fk'(t  -  sk>)]d(si  A  s2), 

Jo  Jo 
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Cov(Wk(t),  Wk,(t))  =  A  f  [  [Fk(t  -  sk)F(t  -  si,t  -  s2)  -  Fk(t  -  sk)Fk>(t  -  sk')]d(si  A  s2), 

Jo  Jo 

Cov(Wk(t),  W(t))  =  A  f  [  [F(t  -  si,t  -  s2)  -  Ffc(f  -  -  si,f  -  s2)]ci(.si  A  s2), 

io  JO 

Cov(Wfc(i),  IT(f))  =  A  f  [  [(F(t  -  si,t  -  s2))2  -  Fk(t  -  sk)F(t  -  S!,t  -  s2)]d(si  As2), 

Jo  Jo 

where  Fk^(x,y)  :=  P(jfk  <  x,rfk,  <  y )  for  x,  y  E  M+,  and 

AF{x:y)  :=  F(y1,y2)  -  F(xl,y2 )  -  F(yi, x2)  +  F(xi, x2),  i,!/6K2+,  x  <  y. 

In  addition,  let  U  :=  {17(f)  :  t  E  M^_}  be  a  continuous  two-parameter  Gaussian  process  with 
mean  zero  and  covariance  function: 

Cov(U(s),U(t ))  =  (Fpefsi  A  ti)F2)C(s2  A  f2)  -  -Fl,e(si)i?2,e(s2)i?l,e(^l)i?2,e(i2)),  (3.39) 

for  s  :  =  (si,  s2)  £  and  t  :=  (fi,  t2)  E  M^_.  Define  Uk  :=  {Uk(t)  :  t  >  0},  for  k  =  1,2,  by 

Ui(t)  :=  U(t,oo),  U2(t)  :=  U(oo,t),  t>  0,  (3.40) 

and  without  abuse  of  notation,  we  denote  17(f)  =  U(t,t),  t  >  0.  Note  that  the  processes  II4 ,  Wk 
and  W  are  independent  with  U ,  as  well  as  Uk,  k  =  1,2. 

We  are  now  ready  to  state  the  FCLT. 

Theorem  3.2.  Under  Assumptions  1  and  4-8, 

(- An,Xn,Yn,Sn,En,Qn,Bn,Dn )  =>  (A,X,Y,S,E,Q,B,D)  (3.41) 

in  D14  as  n  — >•  oo,  where  A  is  in  (3.18),  X ,  Y  and  S  are  the  unique  solutions  to  the  following  set 
of  stochastic  integral  equations:  for  t  >  0  and  k  =  1,  2, 


Xk(t)  =  X°k(t)  -  Nk(3kFkfi{t)  -  J(0)V2l7fc(f)  -  Yki (0)1/2 Bo  k(Fk  e(t)) 

+  [\xk(t  ~  s))+dFk(s )  +  f  Ff(t  -  s)dA(s)  -  Wk(t ),  (3.42) 

Jo  Jo 

Yk(t )  =  Yg(t)  +  Nk/3kFk,e(t )  -  Tfc(0)1/25o>fe/(Ffc/)e(t))  +  J(0)1/2(t/fc(f)  -  17(f)) 

-  A*k(*  -  s))+dFk(s)  +  f  Fk(t  -  s)dA(s)  +  Wf(t)  -  if(t),  (3.43) 

Jo  Jo 

S(t )  =  5°(f)  +  y2(0)1/2B0,i(Nlie(f))  +  Yml/2Bo,2{F2,e(t))  +  J(0)1/2f/(f)  +  W{t)  +  (f), 

(3.44) 

^  77.  ^  ^  Tl  ^  TL 

and  E  ,  Q  ,  B  and  D  are  given  as  follows: 

Ek(t)  =  A(t )  -  (Xfc(f))+,  Afc(t)  =  Xk(0)  +  A(t)  -  Xk(t),  (3.45) 
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Qk(t)  =  ( Xk(t))+ ,  Bk(t )  =  ~{Xk(t))~, 

where 

X°k(t)  :=  Xk(0 )FcKe{t)  +  (Xk(0 ))+(Fkc(t)  -  Fke(t)),  (3.46) 

2 

s°(t)  :=  ^(Yk,(0)Fk,e(t)  +  Zk,(0)Fk{t)Fk,,e(t))  +  J(0)Fhe(t)F2,e(t)  +  J(0)Fm(t),  (3.47) 

k= 1 

*£(*)  :=  %{ 0)  +  Xfc(0)Ffc,e(t)  +  (Xfc(0))+(Ffc(i)  -  F*,c(i))  -  S°(f),  (3.48) 

the  processes  B$k  :=  {Bo  k(t)  :  t  >  0},  k  =  1,2,  are  independent  standard  Brownian  bridges,  the 
process  U  is  a  continuous  two-parameter  Gaussian  process  defined  above  with  the  processes  U\  and 
U2  defined  in  (3.40),  and  the  processes  Wk,  Wk  and  W  are  defined  in  (3.35),  (3.37)  and  (3.36), 
and  B(kk  is  independent  of  U  and  Wk,  Wf  and  W,  and  the  process  41  :=  {\P(t)  :  t  >  0}  defined  by 

V(t):=  [  [  £(t  -  si,t  -  s2)dF(si,s2),  (3.49) 

Jo  Jo 

is  a  well-defined  continuous  process,  where,  for  S\,S2  >  0, 

<S(si,s2)  :=  ia(si)l(si  <  S2)  +  F2(s2)l{s2  <  si)  +  (^i(si)  A  £’2(s2))1(si  =  S2)-  (3.50) 

We  remark  that  the  limit  processes  Xk,  k  =  1,2,  have  the  same  structure  as  the  unique  solution 
to  an  integral  convolution  equation,  as  shown  in  Reed  [45],  but  are  also  different  because  they  are 
both  driven  by  the  same  generalized  multiparameter  Kiefer  process  K  defined  in  Proposition  3.1. 
These  two  limiting  processes  Xk ,  k  =  1,  2,  are  correlated  because  of  the  correlated  service  times  of 
the  parallel  tasks  of  each  job,  which  is  captured  by  the  process  K ,  as  well  as  the  same  arrival  limit 
process  A.  In  fact,  these  two  processes  K  and  A  as  well  as  the  limits  associated  with  the  initial 
quantities  are  the  driving  stochastic  components  of  all  the  limit  processes  in  (3.42)-(3.45). 


4  Concluding  Remarks  and  Future  Work 

We  remark  on  the  main  ideas  of  the  proofs  for  the  limit  theorems  due  to  space  constraint.  The 
main  difficulty  in  the  study  of  many-server  fork-join  networks  with  NES  is  the  resequencing  of 
arrival  orders  after  service  completion  at  each  service  station.  Tasks  of  distinct  jobs  must  be 
differentiated  and  tracked  in  order  to  describe  the  waiting  buffer  dynamics  for  synchronization.  To 
mathematically  describe  the  system  dynamics,  we  develop  a  new  approach  using  multiparameter 
sequential  empirical  processes  driven  by  service  vectors  for  parallel  tasks  of  each  job,  as  depicted 
in  Figure  2.  This  approach  is  used  to  establish  FLLNs  and  FCLTs  for  the  waiting  buffer  processes 
for  synchronization  and  the  service  processes  jointly  in  the  fundamental  fork-join  network  where 
all  service  stations  are  operating  in  the  many-server  heavy-traffic  regimes. 

As  a  prerequisite,  we  first  establish  a  new  FCLT  for  multiparameter  sequential  empirical  pro¬ 
cesses  driven  by  random  vectors  (Theorem  2.1).  To  prove  Theorem  2.1,  we  employ  the  standard 
approach  of  establishing  convergence  of  finite-dimensional  distributions  and  tightness  [?,  20,  55]. 
The  convergence  of  finite-dimensional  distributions  follows  from  the  strong  convergence  result  of 
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multiparameter  empirical  processes  in  [42],  To  prove  tightness,  we  present  a  new  decomposition 
property  for  multiparameter  sequential  empirical  processes,  which  have  a  multiparameter  mar¬ 
tingale  [23,  19],  and  a  second  term  of  finite  variation.  We  apply  properties  of  multiparameter 
martingales  [23,  19]  and  strong  approximations  of  random  walks  by  Brownian  motions  (see  section 
3.5  in  [30])  to  show  the  tightness  of  those  two  decomposed  terms,  respectively.  This  decomposition 
also  plays  a  very  important  role  in  proving  the  tightness  of  the  number  of  tasks  in  each  waiting 
buffer  for  synchronization,  the  number  of  tasks  in  each  parallel  service  station  and  the  number 
of  synchronized  jobs.  Specifically,  the  aforementioned  processes  can  be  decomposed  into  a  linear 
combination  of  three  terms:  an  integral  functional  of  the  arrival  process  and  two  other  terms  from 
the  decomposition  of  the  multiparameter  sequential  empirical  process.  We  apply  Aldous’  tightness 
criteria  (see,  e.g.,  Lemma  3.7  in  [28])  and  another  tightness  criteria  for  processes  with  proper  decom¬ 
positions  satisfying  certain  conditions  (Lemma  VI.3.32  in  [20])  to  verify  the  tightness  property  of 
the  two  terms  related  to  the  sequential  empirical  process  driven  by  the  service  vector,  respectively. 

The  proofs  of  the  limit  theorems  in  the  QD  regime  can  be  regarded  as  generalizations  of  those 
for  G/GI/oo  queues  in  [28].  However,  since  all  the  processes,  X ,  Y  and  S',  are  represented  via 
the  multiparameter  sequential  empirical  processes  driven  by  the  service  vectors,  many  technical 
challenges  must  be  addressed  in  the  multiparameter  setting,  for  example,  using  multiparameter  L2 
martingales,  and  mean-square  limits  of  (integral  functionals  of)  multiparameter  processes  defined 
on  (k  >  2).  One  important  advantage  of  our  new  approach  is  that  all  the  diffusion-scale  limit 
processes  for  X,  Y  and  S  are  all  functionals  of  two  independent  processes  -  the  arrival  limit  and 
the  multiparameter  generalized  Kiefer  process  driven  by  the  service  vector  (Theorem  2.3).  From 
that,  the  characterization  of  the  joint  transient  and  stationary  distributions  of  these  processes  is 
made  possible  (Theorem  2.4). 

The  proofs  in  the  QED  regime  are  based  on  the  important  observations  that  the  system  dy¬ 
namics  of  G/GI/n  queues  can  be  represented  via  the  corresponding  G/GI/oo  service  dynamics 
[45],  and  that  waiting  times  in  the  QED  regime  are  0(l/\/n)  while  service  times  are  0(1).  For 
the  fork-join  network,  we  represent  the  dynamics  of  X ,  Y  and  S  via  that  in  the  corresponding 
infinite-server  fork-join  network  where  the  entering  service  times  in  the  model  are  regarded  as  the 
“arrival”  times  for  the  corresponding  infinite-server  fork-join  network,  as  shown  in  Figure  2(b). 
The  observation  that  the  entering  service  times  in  the  parallel  stations  have  a  difference  of  order 
0(1/ y/n)  is  key  to  prove  the  joint  convergence  of  the  aforementioned  processes.  On  the  other 
hand,  since  we  have  to  simultaneously  handle  the  waiting  times  of  all  parallel  tasks  and  work  with 
multiparameter  sequential  empirical  processes,  we  must  develop  new  techniques  to  prove  tightness, 
including  establishing  new  properties  for  multiparameter  L2  martingales,  and  identifying  a  new 
multivariate  integral  mapping  to  apply  the  continuous  mapping  theorem. 

We  believe  that  a  general  framework  has  been  developed  to  study  fork-join  networks  with 
NES  in  the  many-server  heavy-traffic  regimes  (QD  and  QED).  It  can  be  potentially  used  to  study 
performance  evaluation,  capacity  allocation,  and  control  problems  in  multi-class  fork-join  networks 
under  NES  with  multi-stage  processing.  We  want  to  find  optimal  scheduling  and  routing  policies 
such  that  delays  for  synchronization  as  well  as  delays  for  service  can  be  minimized,  particularly, 
reducing  delays  for  synchronization  to  be  of  a  smaller  order  than  service.  We  also  want  to  find 
optimal  staffing  policies  to  stabilize  delays  for  synchronization  in  addition  to  delays  for  service 
when  arrival  rates  are  time  inhomogeneous.  Our  methods  can  be  extended  to  investigate  reliability 
of  many-server  fork-join  networks  under  NES  in  random  environments  (e.g.,  service  disruptions). 
Fork-join  networks  with  NES  are  more  likely  to  suffer  from  service  disruptions  due  to  the  structural 
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complexity  of  parallel  and  sequential  task  processing.  Component-level  unreliability  can  be  much 
more  amplified  by  its  large  scale.  We  will  extend  our  approach  to  investigate  the  impact  of  service 
disruptions  in  one  or  multiple  service  stations  upon  system  congestion,  particularly,  delays  for 
synchronization  and  throughput. 
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